Security Through the Looking Glass

Heterodox and heretical perspectives on security practices and the security industry as a whole.

This is the second entry in a series. If you haven't read the previous one, now might be a good time. If you're all caught up, then let's get into it.

Viable systems exist within an environment. They adapt to the environment, but they may also change the environment as they adapt and grow. Beavers evolved to build dams, and in doing so they changed the ecosystems in which they evolved. A successful company will change the market. A successful non-profit will change their domain. But even a stable company is only one entity within the environment. Other companies, NGOs, consumers, and the public sector all exist within this same environment. Regulations, technological advancement, and customer tastes all shift the environment.

Within this dynamic environment, what are you doing to track changes that impact your viability? How are you connecting environmental changes, such as regulation, to internal constraints, such as those that drive compliance enforcement and verification?

A security program will need to focus on a specific subset of this question. How do you monitor threat actors? The evolution of threat actors and their attack profiles must inform your risk model and prioritization. As defenses evolve, threat actors adapt and their attack patterns change. How do you collect data? How do you process it into intelligence that's actionable by people on your team?

Similarly, a security program will need a mechanism to track external changes to its attack surface in the form of supply chain risk management. What mechanisms do you have in place to track vulnerabilities in third party libraries, applications, and appliances that you use? How do you make sure that information (and specifically that information, not a flood of false positives) gets to people who can act on it?

We will come back to threat actors and vulnerability management in other articles.

Keeping track of you constraints as they evolve is only half of the constraint management equation. The other half is algedonic signaling. How do you know when constraints are violated?

In the body, an algedonic signal is a basic “pleasure or pain” signal, like those we started with. When your body is hungry, you feel it. When you get hungry enough, it hurts. The signal isn't detailed. It doesn't provide much information. It's simple and generally easy to investigate. Many signals, such as hunger, are intuitive. We learn to understand them immediately and without investigation.

All constraints should be tied to “algedonic” signals that trigger when they violated. But are they? If you've recorded your constants (as described in the previous article) then you can perform a gap analysis.

For each constraint you have identified, write down the signals that fire when those constraints are violated. For any gaps you identify, and figure out a plan to close them. Are there constraints you have listed that you would feel comfortable not being alerted of in case they are violated? Those are not really constraints (at least not at your level) and they should be removed from your constraint list.

Anyone who has worked in tech long enough has seen some variation of alarm fatigue. Perhaps it has even been you who has had an email filtering rule that dumps alerts into a folder that is periodically cleared without any messages ever being read. Many security engineers will have had the experience of pointing to a vulnerability found by some automated scanner or other, a vulnerability which has since been then verified by a security engineer, and asking, “why was this been marked as a false positive?” Perhaps you've observed to someone frantically clicking through a series of dire warnings, recognizing that the warnings intended to stop them have simply trained them to click fast without thinking. Too many signals can overwhelm a system. Notice that I'm not specifying spurious signals. Accuracy is also a problem, but sheer volume of accurate signals can be enough.

Then what is the mechanism by which signals are reduced? Common frameworks can address common vulnerabilities. Many security programs will look for security violations: insecure software being used, vulnerabilities in software being developed, and so on. There are infinitely many ways to implement software insecurely. There may well be as many ways to implement software securely. Sorting between these sets requires tremendous knowledge and has a high probability of failure. By restricting the definition of “secure” as complying with a specific set of vetted patterns, software, hardware, configurations, and the like, the verification work is reduced.

Security Engineering can work on identifying common problems, addressing them with vetted patterns, and ensuring deployed patterns are updated as needed. This internal optimization work helps make the external signals around supply chain risk into something manageable. These restrictions are enforced not by complete refusal to use software, hardware, or configurations not explicitly allowed, but by adding friction, in terms of additional auditing or monitoring requirements, for teams trying to use them. Critical deviations will still be possible, at a cost, and most users will choose the path of least resistance.

Compliance work may revolve around understanding what regulations apply to software, what assets regulatory bodies need, and making sure those assets can be delivered. But it can be easy to miss that compliance can reduce regulatory risk through early intervention. By providing guidance that allows software to be designed in a way that avoids falling under regulation, or by concentrating regulatory risk in a smaller set of components, complexity, and the risk of signal overwhelm, can be reduced.

While cybernetics foregrounds the importance of complexity management, encapsulated as Ashby's Law of Requisite Variety, security program leadership can struggle to see the forest for the trees. In the frantic struggle to fight all the fires, it can be difficult to even notice the larger patterns. This is exactly where cybernetics can help us most.

The VSM defines a set of metasystemic subsystems that manage any system. In the first article we talked about the system itself as an entity surviving within and adapting to an environment. We connected that to the Identity subsystem, and how invariants inform constraints through identity.

In this article, thus far, we have talked more about that dynamic environment and how we need an Awareness subsystem to track environmental changes. We've also talked about managing complexity through synergy using an Optimization subsystem. This leaves two more metasystemic components to cover: Internal Audit and Learning, and Harmonization. Before we talk about those, let's jump down the stack one more time. What are these subsystems learning about, auditing, and harmonizing? All of these metasystemic functions exist to support the Operational Units. Operational Units “do all the work that keeps the system alive.” In the body these are the muscles and organs that breath air, find food, digest food, and remove waste.

While the brain, the metasystem, directs the activity, it is the rest of the body that realizes that activity. It is important to keep this fact in mind as we try to understand the role of security in an organization. With this in mind, let's return to where we were.

The Optimization subsystem can passively coordinate optimization opportunities that are identified by subsystems, but it can also look actively collect data. Internal Audit and Learning is a subsystem of optimization that performs audits and looks for anomalies. Security folks may well be taking notice, recognizing some of our own work in this subsystem. But it's a little more complex than that. Let's talk about the Harmonization subsystem.

The function of the Harmonization subsystem is, essentially, to keep operating units from interfering with each other. If Operational Units are sharing compute resources, for example, Harmonization would include things like permissions and quotas. The VSM says that operational units should have maximal autonomy, but each Operational Unit must be prevented from threatening the viability of the system itself.

Now we, security, can really see ourselves in our familiar role with our familiar problem. Each operational unit may be trying to create a product, ship software or hardware, or to provide a service, but in doing so they can increase the attack surface. Promotion driven development may push team leads to compete to get software out faster and cheaper, incentivizing them to cut corners wherever they can. They ask if they can reduce the scope of a review, or push now and fix later. They may even push features that change the threat model but are, somehow, also “too minor for a security review.”

They each individually peruse their own objectives, and in doing so threaten the security of the organization as a whole. But when vulnerabilities are found, and perhaps even exploited, within a team's code the company as a whole bears the burden. Isn't it interesting that security tends to behave similarly to police, when, in actuality, our job is that of commons management.

Customer trust and internal funds are parts of the internal commons threatened by through a company's collective attack surface. The job of security is to coordinate teams across the company to ensure attack surface is minimized. Security works to protect customer trust from reputational threats, and funds from regulatory action and illegitimate compute usage.

Companies mature enough to understand that “policy enforcement” needs to have teeth, often become too big for that same enforcement role to maintain visibility. Companies that haven't reached that level of maturity simply don't give “policy enforcement” the tools to actually “enforce” anything. “Policy enforcement” creates an adversarial relationship between security and operational units that makes everyone's job harder. By recognizing security as a support role, rather than an authority that demands obedience, we can focus on building the relationships we need to protect the organization that we all share.

The economist Elinor Ostrom worked extensively on commons management, but the rules she developed are outside the scope of this work. That said, we will briefly touch on five rules for commons management will help simplify our work:

  1. Commons must have clearly defined boundaries.
  2. Commons must be monitored.
  3. Those who abuse the commons must be sanctioned.
  4. Commons management must have authority.
  5. Commons management works better when the specific commons exists within a larger commons framework.

We have already touched on both the “authority” and “abuse” elements, though we'll need to come back to those later. In the next section we'll align all of these into a process that's driven by a new threat modeling methodology. This new methodology will pull from the information we established in the first section, and the program we build around it will revisit many of the concepts we've introduced in this section.

If you are unable to get yourself food or water for a long enough period, you will die. We all know this. Our bodies tell us this with hunger and thirst. After a while we start to hurt, a headache or aching stomach. We will be compelled to do something about it, perhaps anything, to survive. You may even feel a bit of anxiety now even just thinking about it.

Some people override these signals consciously. They fast for religious reasons. They go on hunger strikes to resist oppression in the only way they can, or raise awareness about a problem that isn't getting attention. What self-control, one might say, imagining Kafka's Hunger Artist sitting stoically in his cage as he wastes away. (I mean, no one would say that, but I'm not going to miss such a good opportunity to throw in a Kafka reference.)

Control, including both the signals the body sends to the mind and the ability of the mind to override them and dictate the body, is precisely the subject of the study of cybernetics. But cybernetics, of course, isn't specific to biological system. It abstracts concepts into models that can be applied to plants, animals, machines, social systems, and many other things. One such model is the Viable System Model (VSM).

The VSM describes the abstract control components that a system needs in order to “remain viable.” The “viable” part of that is defined by a specific set of invariants. For biological things that means “have enough water,” “have the right nutrients,” “have enough food,” etc, while for businesses that generally means something like “have employees” and “remain solvent.”

All viable systems exist within environments and viability can only be described within the context of these environments. Their existence within an environment necessarily changes that environment, and changes to the environment may influence their viability.

The VSM, as a model, helps us think about the mechanisms which must exist, and the ways in which they must exist, within a system in order to maintain viability. Positively it tells us how to make a system viable or check that a system is viable. But that necessarily means it also happens to be useful for modeling.

So let's take these concepts and begin to think about how we can leverage the power of cybernetics to design or improve a cybersecurity program.

We need to start at the heart of the matter: viability. Whatever you're trying to protect either is a system that must remain viable, or exists within a system that must remain viable. Things that threaten that viability are called “threats,” where a “security threat” is a subset of those threats such that the threat may be intentionally realized by an actor. Now, by “remain viable,” we mean “must maintain a set of invariants.”

A company, as we have discussed, needs (at least) employees to work and money to pay them (if no others). These are the minimal viability invariants of a company. A department within a company may have constraints, perhaps based on metric targets or goals: must maintain 10% customer growth, must achieve positive cash flow for product sales by the end of the year, etc. Subsystems, teams, subteams, individuals, software, will all derive their own constraints (explicitly or implicitly) from these root invariants.

A system's invariants are enforced by the parent system (directly or through inheritance, say, from the laws of physics). Constraints are derived by the system or parent system in order to protect the system against violating the invariant within a specific environmental configuration. As the environment changes, constraints will also need to change; invariants must remain true as a property of the system across all possible environments.

So then, what are those invariants for the highest level system for which you are responsible? List them.

A system does specific things, generally in a specific way, in order to avoid violating these invariants. This defines the system identity. A store sells things within a specific market niche. A manufacturer makes stuff, from specific materials, with specific tools, for a specific market niche. I am a security engineer. I am writing about security. I may be able to pivot between security engineering work and technical writing, but if I try to be a baker tomorrow, and a landscaper next Tuesday, and a carpenter the following Friday, I will quite likely threaten my income. Without income, I will struggle to maintain the housing and food I need to maintain the constraints of temperature and caloric intake that keep me viable. As difficult as it would be for me, as an individual, to rapidly pivot, all the more-so for the shoe store to become a butcher or the cabinet factory to pivot to pharmaceuticals.

Identity informs constraints. What then is the identity of the system for which you are responsible? Are you responsible for a subsystem, such as a department or team, where your identity is derived from a parent system? If yes, how so? Write these things down.

What are the constraints that keep you from violating these invariants? List them.

These constraints may be things like “service and labor spend must be lower than customer provided income,” “services must comply with all applicable regulations to avoid fines,” or “the business must operate in a way consistent with the ethical framework of the community in order to retain employees and reduce retaliatory risk.”

Constraints will be derived from invariants, through various levels of system identity. If you are mapping constraints for a subsystem, you should be able to contextualize these constraints at least within the context of the next level of parent system above yours. A team should understand the department it's working it, a shop should understand regional goals.

You have been asked to record other peaces of information, such as invariants, identity, and derived constraints. This specific information will be useful in the future. But while invariants won't change, and Identity should rarely or never change, constraints are a product of invariants interfacing with a dynamic environment through identity.

This introduces us to the highest level cybernetic metasystemic function, and the dynamic environment within which a viable system must exist. This will set us up to talk in the next essay about the remaining metasystemic functions that allow our system to adapt to this dynamic environment, internally regulate, and manage complexity. In the third essay, we will revisit what you have written down here and use it to think differently about threat modeling.

For many organizations today, threat modeling lies somewhere between theater and magical thinking. Design remains uninformed by threat models, and threat models become dead documents that are written once and thrown away. Assets that should be related to threat models, such as incident response plans, remain disconnected. What should be a tightly-knit living web of knowledge about the risk lifecycle of software becomes an archipelago of dead and disconnected data.

This dead chaos breeds complexity, complexity that can be absorbed by a living knowledge system and a process that builds and leverages that knowledge. By putting the “cyber” back in cybersecurity, we will be able to build a security program that is actually able to consume the complexity we are experiencing today, and not choke. Keep your notes from this section. You will need them for the section on threat modeling.

A train has derailed near a populated area. Multiple people are reporting eye and throat irritation. One person, an elderly man working near the site, has been hospitalized with respiratory complications. What do you do?

Table-top exercises (TTX) are common in public disaster response. They help management teams check their plans. They verify that inter-agency communication and coordination works as expected. They teach team leaders to ask the right questions, and to respond dynamically under pressure. Like practicing forms in martial arts, they train a sort of muscle memory that takes over when it matters.

Disasters will happen. That's a simple fact of reality. While systems should be built to prevent them, it is also critical to prepare to respond to them. By training, organizations decrease response time. Decreasing response time saves lives.

Perhaps some of this sounds familiar. I remember log4shell. I was called into a room with other security leaders. People brainstormed, identified gaps, took tasks, and got to work. Commercial scanners lagged behind, so we designed and built our own. We ran it, manually checked things flagged as “false positives,” and drove every instance into the ground. There were a lot of late nights, a lot of grumpy engineers paged in to patch their systems. It was hard work, stressful at the beginning, but, in the end, it was good. We quickly developed a flow and coordinated closely as an ad-hoc team. We found the bugs, fixed them, and came out the other side with better tools and better procedures.

Large scale security events like log4shell, text4shell, spring4shell, and others are unlikely to become less common. LLMs allow code to be generated more quickly and with less knowledge. Meanwhile, these same LLMs either miss vulnerabilities or are prohibitively expensive to run on incoming code. Core open source infrastructure has been inundated with pull requests, while closed source software is infeasible to use for many projects with no way of knowing if quality is better.

LLMs themselves have been integrated into all sorts of projects. Meanwhile, it remains ultimately impossible to secure LLMs. Our attack surface continues to multiply, while tools for managing this complexity are still evolving. The ever-relevant Gramsci quote haunts us. Indeed, “now is the time of monsters.”

If you use software in 2026, especially if that software is Internet facing, you need to be ready to respond to a large scale event. But CVSS 10 0-day events are uncommon. How do you prepare for something that requires such a high level of coordination and precision, that demands a rapid response, and that doesn't happen very often? More importantly, how do you prepare to respond outside of the crucible of the actual event? Take a page from disaster response: practice.

Our response to log4shell was solid and quick. The room was full of the best engineers and managers, people with a lot of institutional knowledge and a lot of talent. Leaders followed everything closely and always asked the right questions. But we were still inventing things, figuring stuff out, sometimes stepping on each other's toes. After log4shell, we wrote a runbook for such large scale events. Then we tested it, several times, with table-top exercises, until we were confident it could be executed successfully.

Every disaster preparedness exercise starts with a runbook. You can't really test a plan unless you have a plan, can you? Do you have a runbook? Let's talk briefly about what that looks like.

A large scale security event runbook needs to answer several questions:

  1. How do you know there's an event? Do you have a vendor who will alert you? Do you subscribe to a newsletter? Is there an intelligence team? What are you doing to make sure you're on top of security news?
  2. How do you know the event is an emergency for you? Do you have a feed of already-vetted intelligence? Does your security team verify news? How will they do that?
  3. Who do you call in? This should be a list of roles, but those roles should be associated with pagers or phone numbers. Phone numbers should have back-up phone numbers. Do you have the right people? You will need to have people in the room who can answer more questions, or can find the right people to answer those questions.
  4. How do you stop making the problem worse? You are going to keep making code, or at least using it. How do you make sure you aren't deploying vulnerable stuff right now. If you can't stop pushing out more vulnerable code, then you're digging in sand: any progress you do manage to make will only be eaten away as you push more stuff out.
  5. How do you find out what is vulnerable now? Do you have a commercial scanner? Does it have updated signatures? Can you wait for them? Can you shut potentially vulnerable services down while you wait, or are they critical? Do you need to write your own scanner? Do you have the skills in-house to do that? Do you have a vendor you can call?
  6. How do you find out what has been compromised? If you patch fast enough, you may be able to avoid being hit before adversaries can scramble their resources. But don't count on that. Who do you call to figure out what the signatures of a compromise might look like? Do you have an incident response team? Do you have a vendor you work with?
  7. What do you do when you find a vulnerable system? What do you do when you find a compromised system? Some of these questions can fall off to other runbooks. Your vulnerability management team should already have a generic runbook for dealing with vulnerable systems. Your incident response team should have incident response runbooks. Do they work? You can test them while you're testing this one.
  8. How do you inform customers? Do you have a media team? Do you have an existing template for large scale events? What is important to tell customers? What regulatory requirements do you have for disclosure based on your business?
  9. How do you know when you're done? Fires will come and go. You can't stay in emergency mode all the time, or you'll burn yourself and your people out. But if you go home too early you might miss something critical. What is the indicator that it's time to transition to everything back to normal operation? When can normal runbooks take over from large scale event runbooks? Does someone decide? Is there a specific threshold?

The runbook doesn't need to be perfect. It won't be. It will never be. It should not be expected to be. That's the whole point of this exercise.

After reading all of this you may not be ready to go to the next step. That's fine. You've already learned something about your capacity to respond, and that's what's important. Record those lessons, turn them into action items, drive each one to closure. Even if you are ready, this will become a pattern. Each time you test, you learn something. You come out with information that you need to turn into action items. You need to make sure those items get closed. Each cycle will repeat this same pattern.

Do you have a good start? Great. Let's test it.

You'll need to choose a facilitator. In the table-top RPG world, the facilitator who leads the scenario is called a “Game Master” or “GM.” Your GM is going to come up with a scenario, keep track of progress against the scenario, and generally run the exercise. There's a mix of procedure and creativity involved in both gaming and these types of exercises, so we're going to stick with the terminology of “GM” for our facilitator. We're also going to use the terms “player” and “game” to describe team members and the exercise.

The GM will be tasked with coming up with a scenario. This should draw from past experiences, if you have them, or external accounts of incident response. If you have an organizational threat model, you will want to include this in crafting the scenario. What kind of event would most impact your business? What kinds of disruptions would be the most harmful? What subsystems are the most critical?

It may be tempting to use LLMs to help generate scenarios, and this may well work. But the research that goes into generating scenarios also helps build knowledge. That knowledge can be useful in developing a more robust runbook, and in responding more dynamically to disruption.

Fortunately, there are also other resources. CISA has a set of tabletop exercise packages focused on specific industries. Some vendors may create tabletop exercises on request, or may have already existing packages they can support you with. The Backdoors and Breaches card deck can also provide help provide inspiration.

Once you have a scenario, schedule a game day. You can make it as realistic as you want. Getting important people in a room to talk through everything will give you a high level check. But the more realistic the exercise, the more that can go wrong, and the more opportunities you have to learn from that. The scenario will include information about how people find out. Did someone read the vulnerability on HackerNews? How long does it take to from contacting the security help desk to having security leadership in a room?

Once the GM has gathered together the appropriate security roles, as designated by the runbook, the GM will then provide appropriate information. Security will ask questions to get as much information as possible, and execute on the runbook from there.

The game is turn based. Each round starts with the GM giving some information about the current situation and relevant situational changes. Depending on the flow of the scenario, the GM may not reveal all relevant information unless specific actions are carried out. For example, a GM may choose to designate a system as “compromised” during a turn but would only reveal that information after incident response starts looking for signatures.

The GM may play as all external actors. This includes adversaries, security vendors, and external researchers. Alternatively, a “Red Team” may be designated to play as the adversaries against the defending “Blue Team.” When played this way, the exercise begins to look more like a military tabletop exercise. Military models can also prove useful here.

The OODA Loop is a decision-making model developed by Air Force Colonel John Boyd to help fighter pilots make clear decisions under extreme stress. We can also use this model to help train quick and clear decision-making. Since this article focuses on TTX for security, we're only going to briefly touch on the subject here.

During each phase, players can talk through each step of the OODA Loop:

  1. Observe What data do we have? How are we getting our data? Is our data already refined into intelligence, or do we still need to refine it? Keep track of what information is missing so you can collect that later.
  2. Orient What does this data tell us? What do we know, or think we know? How does this new data challenge or confirm that? What does this data mean for us, in the context of this situation? Is anything we're observing now the result of a previous action we've taken, or is it the result of external actors? What does the runbook say? Does any of the information we have trigger a runbook action, or do we need to figure things out? Is anything actionable, or do we need to learn more before we can choose other actions?
  3. Decide Choose an action from the set of actions. If the data doesn't simply trigger the next runbook step, does your next action help you understand the situation more? What belief does your next action imply? How will you know if that action was correct or incorrect? Can you form a hypothesis? What observations would challenge your hypothesis? What observations would confirm it? Are those mutually exclusive, or are there additional observations or actions you must make to clarify things?
  4. Act Finish your turn by choosing your action or actions (individually or collectively). Perhaps take a moment to write down notes, like what your observations, your hypothesis, and if you think your previous hypothesis was confirmed or refuted. You can review these all later to refine your thinking.

End your session after you've either reached the terminal point of your runbook or you've reached a problem with your runbook so bad you had to stop. (Don't worry, this doesn't mean you've done a bad job. It means you've learned something important before it was a problem.)

When you're done, the GM can reveal any hidden information not yet revealed. Plan a retrospective to identify areas of improvement. Turn these notes into action items, and plan to re-run after you've completed those items.

Run this game multiple times until you're confident in the results. You may still learn things each time. You may still come out with action items. But there comes a point where the cost of practice outweighs the value of the lessons you will learn. Where this price point lies depends on your business and the risk tolerance of your industry.

You can get a bit more value out of repeated exercises by making them more realistic (as described earlier), or by adding in a “chaos monkey” element. The “chaos monkey” may remove a person (simulating, for example, a personal emergency), or report that tooling has been broken. In this variant, an incident responder may find expected logs missing, or a critical employee may be unavailable. In the same way that the tool helps you build more a resilient architecture, so does this element help your team become more resilient in the face of challenges.

Large scale runbooks can be tested quarterly, yearly, or every two years, depending on staff turnover and technology changes. Runbooks and skills both begin to rot the moment they are not used.

But why should we stop at large scale events?

Every engineering team should know what to do if their software is compromised. They should know where to look for logs to help incident responders. They should know which regulatory agencies they need to contact, and the time limits for legal compliance. These timelines may be surprising. Companies running in India, for example, have 72 hours to report a breach before risking penalties.

Scaled down versions of these table-top exercises can be run to verify team runbooks. And the GM's role scales well. One GM can facilitate multiple sessions at the same time, with different learning from the exercise together, learning from each other's good ideas and failures. Shared retrospectives can provide opportunities to foster innovation by cross-pollinating between teams.

Rare events can be chaotic, and time can be lost in that chaos. Lost time can be lost data, and lost data can be both lost revenue and fines. By training for rare and unexpected events, it becomes possible to minimize their impact. While a data breach and lose customer trust, a competent response to an unexpected event can also build trust.

How prepared are you for the unexpected? Are you ready to find out?