# Evolve
<a name="a-evolve"></a>

**Topics**
+ [OPS 11. How do you evolve operations?](ops-11.md)

# OPS 11. How do you evolve operations?
<a name="ops-11"></a>

 Dedicate time and resources for nearly continuous incremental improvement to evolve the effectiveness and efficiency of your operations. 

**Topics**
+ [OPS11-BP01 Have a process for continuous improvement](ops_evolve_ops_process_cont_imp.md)
+ [OPS11-BP02 Perform post-incident analysis](ops_evolve_ops_perform_rca_process.md)
+ [OPS11-BP03 Implement feedback loops](ops_evolve_ops_feedback_loops.md)
+ [OPS11-BP04 Perform knowledge management](ops_evolve_ops_knowledge_management.md)
+ [OPS11-BP05 Define drivers for improvement](ops_evolve_ops_drivers_for_imp.md)
+ [OPS11-BP06 Validate insights](ops_evolve_ops_validate_insights.md)
+ [OPS11-BP07 Perform operations metrics reviews](ops_evolve_ops_metrics_review.md)
+ [OPS11-BP08 Document and share lessons learned](ops_evolve_ops_share_lessons_learned.md)
+ [OPS11-BP09 Allocate time to make improvements](ops_evolve_ops_allocate_time_for_imp.md)

# OPS11-BP01 Have a process for continuous improvement
<a name="ops_evolve_ops_process_cont_imp"></a>

Evaluate your workload against internal and external architecture best practices. Conduct workload reviews at least once per year. Prioritize improvement opportunities into your software development cadence. 

 **Desired outcome:** 
+  You analyze your workload against architecture best practices at least yearly. 
+  Improvement opportunities are given equal priority in your software development process. 

 **Common anti-patterns:** 
+ You have not conducted an architecture review on your workload since it was deployed several years ago.
+ Improvement opportunities are given a lower priority and stay in the backlog.
+  There is no standard for implementing modifications to best practices for the organization. 

 **Benefits of establishing this best practice:** 
+  Your workload is kept up to date on architecture best practices. 
+  Evolving your workload is done in a deliberate manner. 
+  You can leverage organization best practices to improve all workloads. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 On at least a yearly basis, you conduct an architectural review of your workload. Using internal and external best practices, evaluate your workload and identify improvement opportunities. Prioritize improvement opportunities into your software development cadence. 

 **Customer example** 

 All workloads at AnyCompany Retail go through a yearly architecture review process. They developed their own checklist of best practices that apply to all workloads. Using the AWS Well-Architected Tool’s Custom Lens feature, they conduct reviews using the tool and their custom lens of best practices. Improvement opportunities generated from the reviews are given priority in their software sprints. 

 **Implementation steps** 

1.  Conduct periodic architecture reviews of your production workload at least yearly. Use a documented architectural standard that includes AWS-specific best practices. 

   1.  We recommend you use your own internally defined standards it for these reviews. If you do not have an internal standard, we recommend you use the AWS Well-Architected Framework. 

   1.  You can use the AWS Well-Architected Tool to create a Custom Lens of your internal best practices and conduct your architecture review. 

   1.  Customers can contact their AWS Solutions Architect to conduct a guided Well-Architected Framework Review of their workload. 

1.  Prioritize improvement opportunities identified during the review into your software development process. 

 **Level of effort for the implementation plan:** Low. You can use the AWS Well-Architected Framework to conduct your yearly architecture review. 

### Resources
<a name="resources"></a>

 **Related best practices:** 
+  [OPS11-BP02 Perform post-incident analysis](ops_evolve_ops_perform_rca_process.md) - Post-incident analysis is another generator for improvement items. Feed lessons learned into your internal list of architecture best practices. 
+  [OPS11-BP08 Document and share lessons learned](ops_evolve_ops_share_lessons_learned.md) - As you develop your own architecture best practices, share those across your organization. 

 **Related documents:** 
+ [AWS Well-Architected Tool - Custom lenses ](https://docs.aws.amazon.com/wellarchitected/latest/userguide/lenses-custom.html)
+ [AWS Well-Architected Whitepaper - The review process ](https://docs.aws.amazon.com/wellarchitected/latest/framework/the-review-process.html)
+ [ Customize Well-Architected Reviews using Custom Lenses and the AWS Well-Architected Tool](https://aws.amazon.com/blogs/mt/customize-well-architected-reviews-using-custom-lenses-and-the-aws-well-architected-tool/)
+ [ Implementing the AWS Well-Architected Custom Lens lifecycle in your organization ](https://aws.amazon.com/blogs/architecture/implementing-the-aws-well-architected-custom-lens-lifecycle-in-your-organization/)

 **Related videos:** 
+ [ Well-Architected Labs - Level 100: Custom Lenses on AWS Well-Architected Tool](https://www.wellarchitectedlabs.com/well-architectedtool/100_labs/100_custom_lenses_on_watool/)

 **Related examples:** 
+ [ The AWS Well-Architected Tool](https://docs.aws.amazon.com/wellarchitected/latest/userguide/intro.html)

# OPS11-BP02 Perform post-incident analysis
<a name="ops_evolve_ops_perform_rca_process"></a>

 Review customer-impacting events, and identify the contributing factors and preventative actions. Use this information to develop mitigations to limit or prevent recurrence. Develop procedures for prompt and effective responses. Communicate contributing factors and corrective actions as appropriate, tailored to target audiences. 

 **Common anti-patterns:** 
+  You administer an application server. Approximately every 23 hours and 55 minutes all your active sessions are terminated. You have tried to identify what is going wrong on your application server. You suspect it could instead be a network issue but are unable to get cooperation from the network team as they are too busy to support you. You lack a predefined process to follow to get support and collect the information necessary to determine what is going on. 
+  You have had data loss within your workload. This is the first time it has happened and the cause is not obvious. You decide it is not important because you can recreate the data. Data loss starts occurring with greater frequency impacting your customers. This also places addition operational burden on you as you restore the missing data. 

 **Benefits of establishing this best practice:** Having a predefined processes to determine the components, conditions, actions, and events that contributed to an incident helps you to identify opportunities for improvement. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Use a process to determine contributing factors: Review all customer impacting incidents. Have a process to identify and document the contributing factors of an incident so that you can develop mitigations to limit or prevent recurrence and you can develop procedures for prompt and effective responses. Communicate root cause as appropriate, tailored to target audiences. 

# OPS11-BP03 Implement feedback loops
<a name="ops_evolve_ops_feedback_loops"></a>

Feedback loops provide actionable insights that drive decision making. Build feedback loops into your procedures and workloads. This helps you identify issues and areas that need improvement. They also validate investments made in improvements. These feedback loops are the foundation for continuously improving your workload.

 Feedback loops fall into two categories: *immediate feedback* and *retrospective analysis*. Immediate feedback is gathered through review of the performance and outcomes from operations activities. This feedback comes from team members, customers, or the automated output of the activity. Immediate feedback is received from things like A/B testing and shipping new features, and it is essential to failing fast. 

 Retrospective analysis is performed regularly to capture feedback from the review of operational outcomes and metrics over time. These retrospectives happen at the end of a sprint, on a cadence, or after major releases or events. This type of feedback loop validates investments in operations or your workload. It helps you measure success and validates your strategy. 

 **Desired outcome:** You use immediate feedback and retrospective analysis to drive improvements. There is a mechanism to capture user and team member feedback. Retrospective analysis is used to identify trends that drive improvements. 

 **Common anti-patterns:** 
+ You launch a new feature but have no way of receiving customer feedback on it.
+ After investing in operations improvements, you don’t conduct a retrospective to validate them.
+ You collect customer feedback but don’t regularly review it.
+ Feedback loops lead to proposed action items but they aren’t included in the software development process.
+  Customers don’t receive feedback on improvements they’ve proposed. 

 **Benefits of establishing this best practice:** 
+  You can work backwards from the customer to drive new features. 
+  Your organization culture can react to changes faster. 
+  Trends are used to identify improvement opportunities. 
+  Retrospectives validate investments made to your workload and operations. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Implementing this best practice means that you use both immediate feedback and retrospective analysis. These feedback loops drive improvements. There are many mechanisms for immediate feedback, including surveys, customer polls, or feedback forms. Your organization also uses retrospectives to identify improvement opportunities and validate initiatives. 

 **Customer example** 

 AnyCompany Retail created a web form where customers can give feedback or report issues. During the weekly scrum, user feedback is evaluated by the software development team. Feedback is regularly used to steer the evolution of their platform. They conduct a retrospective at the end of each sprint to identify items they want to improve. 

## Implementation steps
<a name="implementation-steps"></a>

1. Immediate feedback
   +  You need a mechanism to receive feedback from customers and team members. Your operations activities can also be configured to deliver automated feedback. 
   +  Your organization needs a process to review this feedback, determine what to improve, and schedule the improvement. 
   +  Feedback must be added into your software development process. 
   +  As you make improvements, follow up with the feedback submitter. 
     +  You can use [AWS Systems Manager OpsCenter](https://docs.aws.amazon.com/systems-manager/latest/userguide/OpsCenter.html) to create and track these improvements as [OpsItems](https://docs.aws.amazon.com/systems-manager/latest/userguide/OpsCenter-working-with-OpsItems.html).

1.  Retrospective analysis 
   +  Conduct retrospectives at the end of a development cycle, on a set cadence, or after a major release. 
   +  Gather stakeholders involved in the workload for a retrospective meeting. 
   +  Create three columns on a whiteboard or spreadsheet: Stop, Start, and Keep. 
     +  *Stop* is for anything that you want your team to stop doing. 
     +  *Start* is for ideas that you want to start doing. 
     +  *Keep* is for items that you want to keep doing. 
   +  Go around the room and gather feedback from the stakeholders. 
   +  Prioritize the feedback. Assign actions and stakeholders to any Start or Keep items. 
   +  Add the actions to your software development process and communicate status updates to stakeholders as you make the improvements. 

 **Level of effort for the implementation plan:** Medium. To implement this best practice, you need a way to take in immediate feedback and analyze it. Also, you need to establish a retrospective analysis process. 

## Resources
<a name="resources"></a>

 **Related best practices:** 
+  [OPS01-BP01 Evaluate external customer needs](ops_priorities_ext_cust_needs.md): Feedback loops are a mechanism to gather external customer needs. 
+  [OPS01-BP02 Evaluate internal customer needs](ops_priorities_int_cust_needs.md): Internal stakeholders can use feedback loops to communicate needs and requirements. 
+  [OPS11-BP02 Perform post-incident analysis](ops_evolve_ops_perform_rca_process.md): Post-incident analyses are an important form of retrospective analysis conducted after incidents. 
+  [OPS11-BP07 Perform operations metrics reviews](ops_evolve_ops_metrics_review.md): Operations metrics reviews identify trends and areas for improvement. 

 **Related documents:** 
+  [7 Pitfalls to Avoid When Building a CCOE](https://aws.amazon.com/blogs/enterprise-strategy/7-pitfalls-to-avoid-when-building-a-ccoe/) 
+  [Atlassian Team Playbook - Retrospectives](https://www.atlassian.com/team-playbook/plays/retrospective) 
+  [Email Definitions: Feedback Loops](https://aws.amazon.com/blogs/messaging-and-targeting/email-definitions-feedback-loops/) 
+  [Establishing Feedback Loops Based on the AWS Well-Architected Framework Review](https://aws.amazon.com/blogs/architecture/establishing-feedback-loops-based-on-the-aws-well-architected-framework-review/) 
+  [IBM Garage Methodology - Hold a retrospective](https://www.ibm.com/garage/method/practices/learn/practice_retrospective_analysis/) 
+  [Investopedia – The PDCS Cycle](https://www.investopedia.com/terms/p/pdca-cycle.asp) 
+  [Maximizing Developer Effectiveness by Tim Cochran](https://martinfowler.com/articles/developer-effectiveness.html) 
+  [Operations Readiness Reviews (ORR) Whitepaper - Iteration](https://docs.aws.amazon.com/wellarchitected/latest/operational-readiness-reviews/iteration.html) 
+  [ITIL CSI - Continual Service Improvement](https://wiki.en.it-processmaps.com/index.php/ITIL_CSI_-_Continual_Service_Improvement)
+  [When Toyota met e-commerce: Lean at Amazon](https://www.mckinsey.com/capabilities/operations/our-insights/when-toyota-met-e-commerce-lean-at-amazon) 

 **Related videos:** 
+  [Building Effective Customer Feedback Loops](https://www.youtube.com/watch?v=zz_VImJRZ3U) 

 **Related examples: ** 
+  [Astuto - Open source customer feedback tool](https://github.com/riggraz/astuto) 
+  [AWS Solutions - QnABot on AWS](https://aws.amazon.com/solutions/implementations/qnabot-on-aws/) 
+  [Fider - A platform to organize customer feedback](https://github.com/getfider/fider) 

 **Related services:** 
+  [AWS Systems Manager OpsCenter](https://docs.aws.amazon.com/systems-manager/latest/userguide/OpsCenter.html) 

# OPS11-BP04 Perform knowledge management
<a name="ops_evolve_ops_knowledge_management"></a>

Knowledge management helps team members find the information to perform their job. In learning organizations, information is freely shared which empowers individuals. The information can be discovered or searched. Information is accurate and up to date. Mechanisms exist to create new information, update existing information, and archive outdated information. The most common example of a knowledge management platform is a content management system like a wiki. 

 **Desired outcome:** 
+  Team members have access to timely, accurate information. 
+  Information is searchable. 
+  Mechanisms exist to add, update, and archive information. 

 **Common anti-patterns:** 
+ There is no centralized knowledge storage. Team members manage their own notes on their local machines.
+  You have a self-hosted wiki but no mechanisms to manage information, resulting in outdated information. 
+  Someone identifies missing information but there’s no process to request adding it the team wiki. They add it themselves but they miss a key step, leading to an outage. 

 **Benefits of establishing this best practice:** 
+  Team members are empowered because information is shared freely. 
+  New team members are onboarded faster because documentation is up to date and searchable. 
+  Information is timely, accurate, and actionable. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Knowledge management is an important facet of learning organizations. To begin, you need a central repository to store your knowledge (as a common example, a self-hosted wiki). You must develop processes for adding, updating, and archiving knowledge. Develop standards for what should be documented and let everyone contribute. 

 **Customer example** 

 AnyCompany Retail hosts an internal Wiki where all knowledge is stored. Team members are encouraged to add to the knowledge base as they go about their daily duties. On a quarterly basis, a cross-functional team evaluates which pages are least updated and determines if they should be archived or updated. 

 **Implementation steps** 

1.  Start with identifying the content management system where knowledge will be stored. Get agreement from stakeholders across your organization. 

   1.  If you don’t have an existing content management system, consider running a self-hosted wiki or using a version control repository as a starting point. 

1.  Develop runbooks for adding, updating, and archiving information. Educate your team on these processes. 

1.  Identify what knowledge should be stored in the content management system. Start with daily activities (runbooks and playbooks) that team members perform. Work with stakeholders to prioritize what knowledge is added. 

1.  On a periodic basis, work with stakeholders to identify out-of-date information and archive it or bring it up to date. 

 **Level of effort for the implementation plan:** Medium. If you don’t have an existing content management system, you can set up a self-hosted wiki or a version-controlled document repository. 

## Resources
<a name="resources"></a>

 **Related best practices:** 
+  [OPS11-BP08 Document and share lessons learned](ops_evolve_ops_share_lessons_learned.md) - Knowledge management facilitates information sharing about lessons learned. 

 **Related documents:** 
+ [ Atlassian - Knowledge Management ](https://www.atlassian.com/itsm/knowledge-management)

 **Related examples:** 
+ [ DokuWiki ](https://www.dokuwiki.org/dokuwiki)
+ [ Gollum ](https://github.com/gollum/gollum)
+ [ MediaWiki ](https://www.mediawiki.org/wiki/MediaWiki)
+ [ Wiki.js ](https://github.com/Requarks/wiki)

# OPS11-BP05 Define drivers for improvement
<a name="ops_evolve_ops_drivers_for_imp"></a>

 Identify drivers for improvement to help you evaluate and prioritize opportunities. 

 On AWS, you can aggregate the logs of all your operations activities, workloads, and infrastructure to create a detailed activity history. You can then use AWS tools to analyze your operations and workload health over time (for example, identify trends, correlate events and activities to outcomes, and compare and contrast between environments and across systems) to reveal opportunities for improvement based on your drivers. 

 You should use CloudTrail to track API activity (through the AWS Management Console, CLI, SDKs, and APIs) to know what is happening across your accounts. Track your AWS developer Tools deployment activities with CloudTrail and CloudWatch. This will add a detailed activity history of your deployments and their outcomes to your CloudWatch Logs log data. 

 [Export your log data to Amazon S3](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3Export.html) for long-term storage. Using [AWS Glue](https://aws.amazon.com/glue/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc), you discover and prepare your log data in Amazon S3 for analytics. Use [Amazon Athena](https://aws.amazon.com/athena/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc), through its native integration with AWS Glue, to analyze your log data. Use a business intelligence tool like [Quick](https://aws.amazon.com/quicksight/) to visualize, explore, and analyze your data 

 **Common anti-patterns:** 
+  You have a script that works but is not elegant. You invest time in rewriting it. It is now a work of art. 
+  Your start-up is trying to get another set of funding from a venture capitalist. They want you to demonstrate compliance with PCI DSS. You want to make them happy so you document your compliance and miss a delivery date for a customer, losing that customer. It wasn't a wrong thing to do but now you wonder if it was the right thing to do. 

 **Benefits of establishing this best practice:** By determining the criteria you want to use for improvement, you can minimize the impact of event based motivations or emotional investment. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Understand drivers for improvement: You should only make changes to a system when a desired outcome is supported. 
  +  Desired capabilities: Evaluate desired features and capabilities when evaluating opportunities for improvement. 
    +  [What's New with AWS](https://aws.amazon.com/new/) 
  +  Unacceptable issues: Evaluate unacceptable issues, bugs, and vulnerabilities when evaluating opportunities for improvement. 
    +  [AWS Latest Security Bulletins](https://aws.amazon.com/security/security-bulletins/) 
    +  [AWS Trusted Advisor](https://aws.amazon.com/premiumsupport/trustedadvisor/) 
  +  Compliance requirements: Evaluate updates and changes required to maintain compliance with regulation, policy, or to remain under support from a third party, when reviewing opportunities for improvement. 
    +  [AWS Compliance](https://aws.amazon.com/compliance/) 
    +  [AWS Compliance Programs](https://aws.amazon.com/compliance/programs/) 
    +  [AWS Compliance Latest News](https://aws.amazon.com/compliance/compliance-latest-news/) 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon Athena](https://aws.amazon.com/athena/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc) 
+  [Quick](https://aws.amazon.com/quicksight/) 
+  [AWS Compliance](https://aws.amazon.com/compliance/) 
+  [AWS Compliance Latest News](https://aws.amazon.com/compliance/compliance-latest-news/) 
+  [AWS Compliance Programs](https://aws.amazon.com/compliance/programs/) 
+  [AWS Glue](https://aws.amazon.com/glue/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc) 
+  [AWS Latest Security Bulletins](https://aws.amazon.com/security/security-bulletins/) 
+  [AWS Trusted Advisor](https://aws.amazon.com/premiumsupport/trustedadvisor/) 
+  [Export your log data to Amazon S3](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/S3Export.html) 
+  [What's New with AWS](https://aws.amazon.com/new/) 

# OPS11-BP06 Validate insights
<a name="ops_evolve_ops_validate_insights"></a>

 Review your analysis results and responses with cross-functional teams and business owners. Use these reviews to establish common understanding, identify additional impacts, and determine courses of action. Adjust responses as appropriate. 

 **Common anti-patterns:** 
+  You see that CPU utilization is at 95% on a system and make it a priority to find a way to reduce load on the system. You determine the best course of action is to scale up. The system is a transcoder and the system is scaled to run at 95% CPU utilization all the time. The system owner could have explained the situation to you had you contacted them. Your time has been wasted. 
+  A system owner maintains that their system is mission critical. The system was not placed in a high security environment. To improve security, you implement the additional detective and preventative controls that are required for mission critical systems. You notify the system owner that the work is complete and that he will be charged for the additional resources. In the discussion following this notification, the system owner learns there is a formal definition for mission critical systems that this system does not meet. 

 **Benefits of establishing this best practice:** By validating insights with business owners and subject matter experts, you can establish common understanding and more effectively guide improvement. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Validate insights: Engage with business owners and subject matter experts to ensure there is common understanding and agreement of the meaning of the data you have collected. Identify additional concerns, potential impacts, and determine a courses of action. 

# OPS11-BP07 Perform operations metrics reviews
<a name="ops_evolve_ops_metrics_review"></a>

 Regularly perform retrospective analysis of operations metrics with cross-team participants from different areas of the business. Use these reviews to identify opportunities for improvement, potential courses of action, and to share lessons learned. 

 Look for opportunities to improve in all of your environments (for example, development, test, and production). 

 **Common anti-patterns:** 
+  There was a significant retail promotion that was interrupted by your maintenance window. The business remains unaware that there is a standard maintenance window that could be delayed if there are other business impacting events. 
+  You suffered an extended outage because of your use of a buggy library commonly used in your organization. You have since migrated to a reliable library. The other teams in your organization do not know that they are at risk. If you met regularly and reviewed this incident, they would be aware of the risk. 
+  Performance of your transcoder has been falling off steadily and impacting the media team. It isn't terrible yet. You will not have an opportunity to find out until it is bad enough to cause an incident. Were you to review your operations metrics with the media team, there would be an opportunity for the change in metrics and their experience to be recognized and the issue addressed. 
+  You are not reviewing your satisfaction of customer SLAs. You are trending to not meet your customer SLAs. There are financial penalties related to not meeting your customer SLAs. If you meet regularly to review the metrics for these SLAs, you would have the opportunity to recognize and address the issue. 

 **Benefits of establishing this best practice:** By meeting regularly to review operations metrics, events, and incidents, you maintain common understanding across teams, share lessons learned, and can prioritize and target improvements. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Operations metrics reviews: Regularly perform retrospective analysis of operations metrics with cross-team participants from different areas of the business. Engage stakeholders, including the business, development, and operations teams, to validate your findings from immediate feedback and retrospective analysis, and to share lessons learned. Use their insights to identify opportunities for improvement and potential courses of action. 
  +  [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) 
  +  [Using Amazon CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) 
  +  [Publish custom metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html) 
  +  [Amazon CloudWatch metrics and dimensions reference](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CW_Support_For_AWS.html) 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) 
+  [Amazon CloudWatch metrics and dimensions reference](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CW_Support_For_AWS.html) 
+  [Publish custom metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html) 
+  [Using Amazon CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) 

# OPS11-BP08 Document and share lessons learned
<a name="ops_evolve_ops_share_lessons_learned"></a>

 Document and share lessons learned from the operations activities so that you can use them internally and across teams. 

 You should share what your teams learn to increase the benefit across your organization. You will want to share information and resources to prevent avoidable errors and ease development efforts. This will allow you to focus on delivering desired features. 

 Use AWS Identity and Access Management (IAM) to define permissions permitting controlled access to the resources you wish to share within and across accounts. You should then use version-controlled AWS CodeCommit repositories to share application libraries, scripted procedures, procedure documentation, and other system documentation. Share your compute standards by sharing access to your AMIs and by authorizing the use of your Lambda functions across accounts. You should also share your infrastructure standards as AWS CloudFormation templates. 

 Through the AWS APIs and SDKs, you can integrate external and third-party tools and repositories (for example, GitHub, BitBucket, and SourceForge). When sharing what you have learned and developed, be careful to structure permissions to ensure the integrity of shared repositories. 

 **Common anti-patterns:** 
+  You suffered an extended outage because of your use of a buggy library commonly used in your organization. You have since migrated to a reliable library. The other teams in your organization do not know they are at risk. Were you to document and share your experience with this library, they would be aware of the risk. 
+  You have identified an edge case in an internally shared microservice that causes sessions to drop. You have updated your calls to the service to avoid this edge case. The other teams in your organization do not know that they are at risk. Were you to document and share your experience with this library, they would be aware of the risk. 
+  You have found a way to significantly reduce the CPU utilization requirements for one of your microservices. You do not know if any other teams could take advantage of this technique. Were you to document and share your experience with this library, they would have the opportunity to do so. 

 **Benefits of establishing this best practice:** Share lessons learned to support improvement and to maximize the benefits of experience. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Document and share lessons learned: Have procedures to document the lessons learned from the running of operations activities and retrospective analysis so that they can be used by other teams. 
  +  Share learnings: Have procedures to share lessons learned and associated artifacts across teams. For example, share updated procedures, guidance, governance, and best practices through an accessible wiki. Share scripts, code, and libraries through a common repository. 
    +  [Delegating access to your AWS environment](https://www.youtube.com/watch?v=0zJuULHFS6A&t=849s) 
    +  [Share an AWS CodeCommit repository](https://docs.aws.amazon.com/codecommit/latest/userguide/how-to-share-repository.html) 
    +  [Easy authorization of AWS Lambda functions](https://aws.amazon.com/blogs/compute/easy-authorization-of-aws-lambda-functions/) 
    +  [Sharing an AMI with specific AWS Accounts](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sharingamis-explicit.html) 
    +  [Speed template sharing with an AWS CloudFormation designer URL](https://aws.amazon.com/blogs/devops/speed-template-sharing-with-an-aws-cloudformation-designer-url/) 
    +  [Using AWS Lambda with Amazon SNS](https://docs.aws.amazon.com/lambda/latest/dg/with-sns-example.html) 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Easy authorization of AWS Lambda functions](https://aws.amazon.com/blogs/compute/easy-authorization-of-aws-lambda-functions/) 
+  [Share an AWS CodeCommit repository](https://docs.aws.amazon.com/codecommit/latest/userguide/how-to-share-repository.html) 
+  [Sharing an AMI with specific AWS Accounts](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/sharingamis-explicit.html) 
+  [Speed template sharing with an AWS CloudFormation designer URL](https://aws.amazon.com/blogs/devops/speed-template-sharing-with-an-aws-cloudformation-designer-url/) 
+  [Using AWS Lambda with Amazon SNS](https://docs.aws.amazon.com/lambda/latest/dg/with-sns-example.html) 

 **Related videos:** 
+  [Delegating access to your AWS environment](https://www.youtube.com/watch?v=0zJuULHFS6A&t=849s) 

# OPS11-BP09 Allocate time to make improvements
<a name="ops_evolve_ops_allocate_time_for_imp"></a>

 Dedicate time and resources within your processes to make continuous incremental improvements possible. 

 On AWS, you can create temporary duplicates of environments, lowering the risk, effort, and cost of experimentation and testing. These duplicated environments can be used to test the conclusions from your analysis, experiment, and develop and test planned improvements. 

 **Common anti-patterns:** 
+  There is a known performance issue in your application server. It is added to the backlog behind every planned feature implementation. If the rate of planned features being added remains constant, the performance issue will never be addressed. 
+  To support continual improvement you approve administrators and developers using all their extra time to select and implement improvements. No improvements are ever completed. 

 **Benefits of establishing this best practice:** By dedicating time and resources within your processes you make continuous incremental improvements possible. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>
+  Allocate time to make improvements: Dedicate time and resources within your processes to make continuous incremental improvements possible. Implement changes to improve and evaluate the results to determine success. If the results do not satisfy the goals, and the improvement is still a priority, pursue alternative courses of action.