

# Process
<a name="process"></a>

 Developing thorough and clearly defined incident response processes is key to a successful and scalable incident response program. When a security event occurs, clear steps and workflows will help you to respond in a timely manner. You might already have an existing incident response processes. Regardless of your current state, it’s important to update, iterate, and test your incident response processes regularly. 

# Develop and test an incident response plan
<a name="develop-and-test-incident-response-plan"></a>

 The first document to develop for incident response is the *incident response plan*. The incident response plan is designed to be the foundation for your incident response program and strategy. An incident response plan is a high-level document that typically includes these sections: 
+ **An incident response team overview** – Outlines the goals and functions of the incident response team 
+ **Roles and responsibilities** – Lists the incident response stakeholders and details their roles when an incident occurs 
+ **A communication plan** – Details contact information and how you will communicate during an incident 

   It’s a best practice to have out-of-band communication as a backup for incident communication. An example of an application that provides a secure out-of-band communications channel is [AWS Wickr](https://aws.amazon.com/wickr/).
+ **Phases of incident response and actions to take** – Enumerates the phases of incident response – for example, detect, analyze, eradicate, contain, and recover – including high-level actions to take within those phases
+ **Incident severity and prioritization definitions** – Details how to classify the severity of an incident, how to prioritize the incident, and then how the severity definitions affect escalation procedures

 While these sections are common throughout companies of different sizes and industries, each organization’s incident response plan is unique. You will need to build an incident response plan that works best for your organization. 

# Document and centralize architecture diagrams
<a name="document-and-centralize-architecture-diagrams"></a>

 To quickly and accurately respond to a security event, you need to understand how your systems and networks are architected. Understanding these internal patterns is not only important for incident response, but also for verifying consistency across applications that the patterns are architected with, according to best practices. You should also verify that this documentation is up to date and regularly updated in accordance with new architecture patterns. You should develop documentation and internal repositories that detail items such as: 
+ **AWS account structure** - You need to know: 
  +  How many AWS accounts do you have? 
  +  How are those AWS accounts organized? 
  +  Who are the business owners of the AWS accounts? 
  +  Do you use Service Control Policies (SCPs)? If so, what organizational guardrails are implemented by using SCPs? 
  +  Do you limit the Regions and services that can be used? 
  +  What differences are there between business units and environments (dev/test/prod)? 
+ **AWS service patterns** 
  +  What AWS services do you use? 
  +  What are the most widely used AWS services? 
+ **Architecture patterns** 
  +  What cloud architectures do you use? 
+ **AWS authentication patterns** 
  +  How do your developers typically authenticate to AWS? 
  +  Do you use IAM roles or users (or both)? Is your authentication to AWS connected to an identity provider (IdP)? 
  +  How do you map an IAM role or user to an employee or system? 
  +  How does access get revoked when someone is no longer authorized? 
+ **AWS authorization patterns** 
  +  What IAM policies do your developers use? 
  +  Do you use resource-based policies? 
+ **Logging and monitoring** 
  +  What logging sources do you use and where are they stored? 
  +  Do you aggregate AWS CloudTrail logs? If so, where are they stored? 
  +  How do you query CloudTrail logs? 
  +  Do you have Amazon GuardDuty enabled? 
  +  How do you access GuardDuty findings (for example, console, ticketing system, SIEM)? 
  +  Are findings or events aggregated in a SIEM? 
  +  Are tickets automatically created? 
  +  What tooling is in place to analyze logs for an investigation? 
+ **Network topology** 
  +  How are devices, endpoints, and connections on your network physically or logically arranged? 
  +  How does your network connect with AWS? 
  +  How is network traffic filtered between environments? 
+ **External infrastructure** 
  +  How are externally-facing applications deployed? 
  +  What AWS resources are publicly accessible? 
  +  What AWS accounts contain infrastructure that is externally facing? 
  +  What DDoS or external filtering is there? 

 Documenting internal technical diagrams and processes eases the incident response analyst’s job, helping them quickly obtain the institutional knowledge to respond to a security event. Thorough documentation of internal technical processes not only simplifies security investigations, but also adjusts for rationalization and evaluation of the processes. 

# Develop incident response playbooks
<a name="develop-incident-response-playbooks"></a>

 A key part of preparing your incident response processes is developing playbooks. Incident response playbooks provide a series of prescriptive guidance and steps to follow when a security event occurs. Having clear structure and steps simplifies the response and reduces the likelihood for human error. 

# What to create playbooks for
<a name="what-to-create-playbooks-for"></a>

 Playbooks should be created for incident scenarios such as: 
+  **Expected incidents** – Playbooks should be created for incidents you anticipate. This includes threats like denial of service (DoS), ransomware, and credential compromise. 
+ ** Known security findings or alerts** – Playbooks should be created for your known security findings and alerts, such as GuardDuty findings. You might receive a GuardDuty finding and think, “Now what?” To prevent mishandling of a GuardDuty finding or ignoring the finding, create a playbook for each potential GuardDuty finding. Some remediation details and guidance can be found in the [GuardDuty documentation](https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_remediate.html). It’s worth noting that GuardDuty is not enabled by default and does incur a cost. More details on GuardDuty can be found in Appendix A: Cloud capability definitions - [Visibility and alerting](visibility-and-alerting.md). 

# What to include in playbooks
<a name="what-to-include-in-playbooks"></a>

 Playbooks should contain technical steps for a security analyst to complete in order to adequately investigate and respond to a potential security incident. 

 Items to include in a playbook include: 
+  **Playbook overview** – What risk or incident scenario does this playbook address? What is the goal of the playbook?
+  **Prerequisites** – What logs and detection mechanisms are required for this incident scenario? What is the expected notification? 
+ ** Stakeholder information** – Who is involved and what is their contact information? What are each of the stakeholders’ responsibilities? 
+ ** Response steps** – Across phases of incident response, what tactical steps should be taken? What queries should an analyst run? What code should be run to achieve the desired outcome? 
  + ** Detect **– How will the incident be detected? 
  + ** Analyze** – How will the scope of impact be determined? 
  + ** Contain** – How will the incident be isolated to limit scope? 
  + ** Eradicate** – How will the threat be removed from the environment? 
  + ** Recover** – How will the affected system or resource be brought back into production? 
+ ** Expected outcomes** – After queries and code are run, what is the expected result of the playbook? 

 To verify consistent information in each playbook, it can be helpful to create a playbook template to use across your other security playbooks. Some of the previously listed items, such as stakeholder information, can be shared across multiple playbooks. If that is the case, you can create centralized documentation for that information and reference it in the playbook, then enumerate the explicit differences in the playbook. This will prevent you from having to update the same information in all of your individual playbooks. Through creating a template and identifying common or shared information in playbooks, you can simplify and speed up playbook development. Lastly, your playbooks will likely evolve over time; once you have confirmed that the steps are consistent, this forms the requirements for automation. 

# Sample playbooks
<a name="sample-playbooks"></a>

 A number of sample playbooks can be found in Appendix B in [Playbook resources](appendix-b-incident-response-resources.md#playbook-resources). The examples here can be used to guide you on what playbooks to create and what to include in your playbooks. However, it’s important you craft playbooks that incorporate the risks most relevant to your business. You need to verify that the steps and workflows within your playbooks include your technologies and processes. 

# Run regular simulations
<a name="run-regular-simulations"></a>

 Organizations grow and evolve over time, as does the threat landscape. Because of this, it’s important to continually review your incident response capabilities. Simulations are one method that can be used to perform this assessment. Simulations use real-world security event scenarios designed to mimic a threat actor’s tactics, techniques, and procedures (TTPs) and allow an organization to exercise and evaluate their incident response capabilities by responding to these mock cyber events as they might occur in reality. 

 Simulations have a variety of benefits, including: 
+  Validating cyber readiness and developing the confidence of your incident responders. 
+  Testing the accuracy and efficiency of tools and workflows. 
+  Refining communication and escalation methods aligned with your incident response plan. 
+  Providing an opportunity to respond to less common vectors. 

# Types of simulations
<a name="types-of-simulations"></a>

 There are three main types of simulations: 
+  **Tabletop exercises** – The tabletop approach to simulations is strictly a discussion-based session involving the various incident response stakeholders to practice roles and responsibilities and use established communication tools and playbooks. Exercise facilitation can typically be accomplished in a full day in a virtual venue, a physical venue, or a combination. Because of its discussion-based nature, the tabletop exercise focuses on processes, people, and collaboration. Technology is an integral part of the discussion; however, the actual use of incident response tools or scripts is generally not a part of the tabletop exercise. 
+  **Purple Team exercises** – Purple Team exercises increase the level of collaboration between the incident responders (*Blue Team*) and simulated threat actors (*Red Team*). The Blue Team is generally comprised of members of the Security Operations Center (SOC), but can also include other stakeholders that would be involved during an actual cyber event. The Red Team is generally comprised of a penetration testing team or key stakeholders that are trained in offensive security. The Red Team works collaboratively with the exercise facilitators when designing a scenario so that the scenario is accurate and feasible. During Purple Team exercises, the primary focus is on the detection mechanisms, the tools, and the standard operating procedures (SOPs) supporting the incident response efforts. 
+ ** Red Team exercises** – During a Red Team exercise, the offense (*Red Team*) conducts a simulation to achieve a certain objective or set of objectives from a pre-determined scope. The defenders (*Blue Team*) will not necessarily know the scope and duration of the exercise, which provides a more realistic assessment of how they would respond to an actual incident. Because Red Team exercises can be invasive tests, you should be cautious and implement controls to verify that the exercise does not cause actual harm to your environment. 

**Note**  
AWS requires customers to review the policy for penetration testing available on the [ Penetration Testing website](https://aws.amazon.com/security/penetration-testing/) before they conduct Purple Team or Red Team exercises. 

 Table 1 summarizes a few key differences in these types of simulations. It’s important to note that the definitions are generally considered loose definitions and can be customized to fit the needs of your organization. 

* Table 1 – Types of simulations*


|   |  Tabletop exercise  |  Purple Team exercise  |  Red Team exercise  | 
| --- | --- | --- | --- | 
|  Summary  |  Paper-driven exercises that focus on one specific security incident scenario. These can be either high-level or technical, and are driven by a series of paper injects.  |  A more realistic offering compared to tabletop exercises. During Purple Team exercises, facilitators work collaboratively with the participants to increase exercise engagement and offer training where necessary.  |  Generally a more advanced simulation offering. There is usually a high level of covertness, where the participants might not know all of the details of the exercise.  | 
|  Resources required  |  Limited technical resources required  |  Various stakeholders required and high level of technical resources needed  |  Various stakeholders required and high level of technical resources needed  | 
|  Complexity  |  Low  |  Medium  |  High  | 

 Consider facilitating cyber simulations at a regular interval. Each exercise type can provide unique benefits to the participants and the organization as a whole, so you might choose to start with less complex simulation types (such as tabletop exercises) and progress to more complex simulation types (Red Team exercises). You should select a simulation type based on your security maturity, resources, and your desired outcomes. Some customers might not choose to perform Red Team exercises due to complexity and cost. 

# Exercise lifecycle
<a name="exercise-lifecycle"></a>

 Regardless of the type of simulation you choose, simulations generally follow these steps: 

1.  **Define core exercise elements** – Define the simulation scenario and the objectives of the simulation. Both of these should have leadership acceptance. 

1. ** Identify key stakeholders** – At a minimum, an exercise needs exercise facilitators and participants. Depending on the scenario, additional stakeholders such as legal, communications, or executive leadership might be involved. 

1. ** Build and test the scenario** – The scenario might need to be redefined as it is being built if specific elements aren’t feasible. A finalized scenario is expected as the output of this stage. 

1. ** Facilitate the simulation** – The type of simulation determines the facilitation used (paper-based scenario compared to highly technical, simulated scenario). The facilitators should align their facilitation tactics to the exercise objects and they should engage all exercise participants wherever possible to provide the most benefit. 

1. ** Develop the after action report (AAR)** – Identify areas that went well, those that can use improvement, and potential gaps. The AAR should measure the effectiveness of the simulation as well as the team’s response to the simulated event so that progress can be tracked over time with future simulations. 