Table Top Security Exercises

A train has derailed near a populated area. Multiple people are reporting eye and throat irritation. One person, an elderly man working near the site, has been hospitalized with respiratory complications. What do you do?

Table top exercises (TTX) are common in public disaster response. They help management teams check their plans. They verify that inter-agency communication and coordination works as expected. They teach team leaders to ask the right questions, and to respond dynamically under pressure. Like practicing forms in martial arts, they train a sort of muscle memory that takes over when it matters.

Disasters will happen. That's a simple fact of reality. While systems should be built to prevent them, it is also critical to prepare to respond to them. By training, organizations decrease response time. Decreasing response time saves lives.

Perhaps some of this sounds familiar. I remember log4shell. I was called into a room with other security leaders. People brainstormed, identified gaps, took tasks, and got to work. Commercial scanners lagged behind, so we designed and built our own. We ran it, manually checked things flagged as “false positives,” and drove every instance into the ground. There were a lot of late nights, a lot of grumpy engineers paged in to patch their systems. It was hard work, stressful at the beginning, but, in the end, it was good. We quickly developed a flow and coordinated closely as an ad-hoc team. We found the bugs, fixed them, and came out the other side with better tools and better procedures.

Large scale security events like log4shell, text4shell, spring4shell, and others are unlikely to become less common. LLMs allow code to be generated more quickly and with less knowledge. Meanwhile, these same LLMs either miss vulnerabilities or are prohibitively expensive to run on incoming code. Core open source infrastructure has been inundated with pull requests, while closed source software is infeasible to use for many projects with no way of knowing if quality is better.

LLMs themselves have been integrated into all sorts of projects. Meanwhile, it remains ultimately impossible to secure LLMs. Our attack surface continues to multiply, while tools for managing this complexity are still evolving. The ever-relevant Gramsci quote haunts us. Indeed, “now is the time of monsters.”

If you use software in 2026, especially if that software is Internet facing, you need to be ready to respond to a large scale event. But CVSS 10 0-day events are uncommon. How do you prepare for something that requires such a high level of coordination and precision, that demands a rapid response, and that doesn't happen very often? More importantly, how do you prepare to respond outside of the crucible of the actual event? Take a page from disaster response: practice.

Our response to log4shell was solid and quick. The room was full of the best engineers and managers, people with a lot of institutional knowledge and a lot of talent. Leaders followed everything closely and always asked the right questions. But we were still inventing things, figuring stuff out, sometimes stepping on each other's toes. After log4shell, we wrote a runbook for such large scale events. Then we tested it, several times, with table top exercises, until we were confident it could be executed successfully.

Every disaster preparedness excise starts with a runbook. You can't really test a plan unless you have a plan, can you? Do you have a runbook? Let's talk briefly about what that looks like.

A large scale security event runbook needs to answer several questions:

  1. How do you know there's an event? Do you have a vendor who will alert you? Do you subscribe to a newsletter? Is there an intelligence team? What are you doing to make sure you're on top of security news?
  2. How do you know the event is an emergency for you? Do you have a feed of already-vetted intelligence? Does your security team verify news? How will they do that?
  3. Who do you call in? This should be a list of roles, but those roles should be associated with pagers or phone numbers. Phone numbers should have back-up phone numbers. Do you have the right people? You will need to have people in the room who can answer more questions, or can find the right people to answer those questions.
  4. How do you stop making the problem worse? You are going to keep making code, or at least using it. How do you make sure you aren't deploying vulnerable stuff right now. If you can't stop pushing out more vulnerable code, then you're digging in sand: any progress you do manage to make will only be eaten away as you push more stuff out.
  5. How do you find out what is vulnerable now? Do you have a commercial scanner? Does it have updated signatures? Can you wait for them? Can you shut potentially vulnerable services down while you wait, or are they critical? Do you need to write your own scanner? Do you have the skills in-house to do that? Do you have a vendor you can call?
  6. How do you find out what has been compromised? If you patch fast enough, you may be able to avoid being hit before adversaries can scramble their resources. But don't count on that. Who do you call to figure out what the signatures of a compromise might look like? Do you have an incident response team? Do you have a vendor you work with?
  7. What do you do when you find a vulnerable system? What do you do when you find a compromised system? Some of these questions can fall off to other runbooks. Your vulnerability management team should already have a generic runbook for dealing with vulnerable systems. Your incident response team should have incident response runbooks. Do they work? You can test them while you're testing this one.
  8. How do you inform customers? Do you have a media team? Do you have an existing template for large scale events? What is important to tell customers? What regulatory requirements do you have for disclosure based on your business?
  9. How do you know when you're done? Fires will come and go. You can't stay in emergency mode all the time, or you'll burn yourself and your people out. But if you go home too early you might miss something critical. What is the indicator that it's time to transition to everything back to normal operation? When can normal runbooks take over from large scale event runbooks? Does someone decide? Is there a specific threshold?

The runbook doesn't need to be perfect. It won't be. It will never be. It should not expected to be. That's the whole point of this exercise.

After reading all of this you may not be ready to go to the next step. That's fine. You've already learned something about your capacity to respond, and that's what's important. Record those lessons, turn them into action items, drive each one to closure. Even if you are ready, this will become a pattern. Each time you test, you learn something. You come out with information that you need to turn into action items. You need to make sure those items get closed. Each cycle will repeat this same pattern.

Do you have a good start? Great. Let's test it.

You'll need to choose a facilitator. In the table top RPG world, the facilitator who leads the scenario is called a “Game Master” or “GM.” Your GM is going to come up with a scenario, keep track of progress against the scenario, and generally run the exercise. There's a mix of procedure and creativity involved in both gaming and these types of exercises, so we're going to stick with the terminology of “GM” for our facilitator. We're also going to use the terms “player” and “game” to describe team members and the exercise.

The GM will be tasked with coming up with a scenario. This should draw from past experiences, if you have them, or external accounts of incident response. If you have an organizational threat model, you will want to include this in crafting the scenario. What kind of event would most impact your business? What kinds of disruptions would be the most harmful? What subsystems are the most critical?

It may be tempting to use LLMs to help generate scenarios, and this may well work. But the research that goes in to generating scenarios also helps build knowledge. That knowledge can be useful in developing a more robust runbook, and in responding more dynamically to disruption.

Once you have a scenario, schedule a game day. You can make it as realistic as you want. Getting important people in a room to talk through everything will give you a high level check. But the more realistic the exercise, the more that can go wrong, and the more opportunities you have to learn from that. The scenario will include information about how people find out. Did someone read the vulnerability on HackerNews? How long does it take to from contacting the security help desk to having security leadership in a room?

Once the GM has the appropriate security roles, as designated by the runbook, the GM will then provide appropriate information. Security will ask questions to get as much information as possible, and execute on the runbook from there.

The game is turn based. Each round starts with the GM giving some information about the current situation and relevant situational changes. Depending on the flow of the scenario, the GM may not reveal all relevant information unless specific actions are carried out. For example, a GM may choose to designate a system as “compromised” during a turn but would only reveal that information after incident response starts looking for signatures.

The GM may play as all external actors. This includes adversaries, security vendors, and external researchers. Alternatively, a “Red Team” may be designated to play as the adversaries against the defending “Blue Team.” When played this way, the exercise begins to look more like a military tabletop exercise. Military models can also prove useful here.

The OODA Loop is a decision making model developed by Air Force Colonel John Boyd to help fighter pilots make clear decisions under extreme stress. We can also use this model to help train quick and clear decision making. Since this this article focuses on TTX for security, we're only going to briefly touch on the subject here.

During each phase, players can talk through each step of the OODA Loop:

  1. Observe What data do we have? How are we getting our data? Is our data already refined into intelligence, or do we still need to refine it? Keep track of what information is missing so you can collect that later.
  2. Orient What does this data tell us? What do we know, or think we know? How does this new data challenge or confirm that? What does this data mean for us, in the context of this situation? Is anything we're observing now the result of a previous action we've taken, or is it the result of external actors? What does the runbook say? Does any of the information we have trigger a runbook action, or do we need to figure things out? Is anything actionable, or do we need to learn more before we can choose other actions?
  3. Decide Choose an action from the set of actions. If the data doesn't simply trigger the next runbook step, does your next action help you understand the situation more? What belief does your next action imply? How will you know if that action was correct or incorrect? Can you form a hypothesis? What observations would challenge your hypothesis? What observations would confirm it? Are those mutually exclusive, or are there additional observations or actions you must make to clarify things?
  4. Act Finish your turn by choosing your action or actions (individually or collectively). Perhaps take a moment to write down notes, like what your observations, your hypothesis, and if you think your previous hypothesis was confirmed or refuted. You can review these all later to refine your thinking.

End your session after you've either reached the terminal point of your runbook or you've reached a problem with your runbook so bad you had to stop. (Don't worry, this doesn't mean you've done a bad job. It means you've learned something important before it was a problem.)

When you're done, the GM can reveal any hidden information not yet revealed. Plan a retrospective to identify areas of improvement. Turn these notes into action items, and plan to re-run after you've completed those items.

Run this game multiple times until you're confident in the results. You may still learn things each time. You may still come out with action items. But there comes a point where the cost of practice outweighs the value of the lessons you will learn. Where this price point lies depends on your business and the risk tolerance of your industry.

You can get a bit more value out of repeated exercises by making them more realistic (as described earlier), or by adding in a “chaos monkey” element. The “chaos monkey” may remove a person (simulating, for example, a personal emergency), or report that tooling has been broken. In this variant, an incident responder may find expected logs missing or a critical employee may be unavailable. In the same way that the tool helps you build more a resilient architecture, so does this element help your team become more resilient in the face of challenges.

Large scale runbooks can be tested quarterly, yearly, or every two years, depending on staff turnover and technology changes. Runbooks and skills both begin to rot the moment they are not used.

But why should we stop at large scale events?

Every engineering team should know what to do if their software is compromised. They should know where to look for logs to help incident responders. They should know which regulatory agencies they need to contact, and the time limits for legal compliance. These time lines may be surprising. Companies running in India, for example, have 72 hours to report a breach before risking penalties.

Scaled down versions of these table top exercises can be run to verify team runbooks. And the GM's role scales well. One GM can facilitate multiple sessions at the same time, with different learning from the exercise together, learning from each other's good ideas and failures. Shared retrospectives can provide opportunities to foster innovation by cross-pollinating between teams.

Rare events can be chaotic, and time can be lost in that chaos. Lost time can be lost data, and lost data can be both lost revenue and fines. By training for rare and unexpected events, it becomes possible to minimize their impact. While a data breach and lose customer trust, a competent response to an unexpected event can also build trust.

How prepared are you for the unexpected? Are you ready to find out?