# Incident Detection and Response でワークロードを管理する効果的なインシデント管理で重要な部分は、モニタリング対象のワークロードのオンボーディング、テスト、維持に適したプロセスと手順を設定することです。このセクションでは、インシデント中のチームを導くための包括的なランブックと対応計画の作成、オンボーディング前の新しいワークロードの徹底したテストと検証、ワークロードのモニタリングを更新する変更のリクエスト、必要に応じたワークロードの適切なオフボーディングなど、重要なステップについて説明します。 **Topics** + [ランブックと対応計画を作成する](idr-workloads-dev-runbook.md) + [オンボードされたワークロードをテストする](idr-workloads-testing.md) + [ワークロードへの変更をリクエストする](idr-workloads-change-request.md) + [アラームを抑制](idr-workloads-suppress-alarms.md) + [ワークロードのオフボード](idr-workloads-offboard.md) # Incident Detection and Response でインシデントに対応するためのランブックと対応計画を作成する Incident Detection and Response では、オンボーディングアンケートから取得した情報を使用して、ワークロードに影響するインシデントを管理するためのランブックと対応計画を作成します。ランブックは、Incident Manager がインシデントに対応するときに実行するステップを文書化したものです。対応計画は、少なくとも 1 つのワークロードにマッピングされます。インシデント管理チームは、[ワークロードの検出](idr-gs-discovery.md)で提供された情報から、これらのテンプレートを作成します。対応計画は、インシデントのトリガーに使用される AWS Systems Manager (SSM) ドキュメントテンプレートです。SSM ドキュメントの詳細については、「[AWS Systems Manager ドキュメント](https://docs.aws.amazon.com/systems-manager/latest/userguide/sysman-ssm-docs.html)」を参照してください。Incident Manager の詳細については、「[AWS Systems Manager Incident Manager とは](https://docs.aws.amazon.com/incident-manager/latest/userguide/index.html)」を参照してください。 **重要なアウトプット:** + AWS Incident Detection and Response に関するワークロードの定義を入力します。 + AWS Incident Detection and Response に関するアラーム、ランブック、対応計画の定義を入力します。 AWS Incident Detection and Response ランブックの例、[aws-idr-runbook-example.zip](samples/aws-idr-runbook-example.zip) をダウンロードすることもできます。ランブックの例: ``` Runbook template for AWS Incident Detection and Response # Description This document is intended for [CustomerName] [WorkloadName]. [Insert short description of what the workload is intended for]. ## Step: Priority **Priority actions** 1. When a case is created with Incident Detection and Response, lock the case to yourself, verify the Customer Stakeholders in the Case from *Engagement Plans - Initial Engagement*. 2. Send the first correspondence on the support case to the customer as below. If there is no support case or if it is not possible to use the support case then backup communication details are listed in the steps that follow. ``` Hello, This is <> from AWS Incident Detection and Response. An alarm has triggered for your workload <>. I am currently investigating and will update you in a few minutes after I have finished initial investigation. Alarm Identifier - ``` **Compliance and regulatory requirements for the workload** <> **Actions required from Incident Detection and Response in complying** <> ## Step: Information **Review of common information** * This section provides a space for defining common information which may be needed through the life of the incident. * The target user of this information is the Incident Management Engineer and Operations Engineer. * The following steps may reference this information to complete an action (for example, execute the "Initial Engagement" plan). --- **Engagement plans** Describe the engagement plans applicable to this runbook. This section contains only contact details. Engagement plans will be referenced in the step by step **Communication Plans**. * **Initial engagement** AWS Incident Detection and Response Team will add customer stakeholder addresses below to the Support Case. AWS Stakeholders are for additional stakeholders that may need to be made aware of any issues. When updating customer stakeholders details in this plan also update the Backup Mailto links. * ***Customer Stakeholders***: customeremail1; customeremail2; etc * ***AWS Stakeholders***: aws-idr-oncall@amazon.com; tam-team-email; etc. * ***One Time Only Contacts***: [These are email contacts that are included on only the first communication. Remove these contacts after the first communication has gone out. These could be customer paging email addresses such as pager-duty that must not be paged for every correspondence] * ***Backup Mailto Impact Template***: <*Insert Impact Template Mailto Link here*> * Use the backup Mailto when communication over cases is not possible. * ***Backup Mailto No Impact Template***: <*Insert No Impact Mailto Link here*> * Use the backup Mailto when communication over cases is not possible. * **Engagement Escalation** AWS Incident Detection and Response will reach out to the following contacts when the contacts from the **Initial engagement** plan do not respond to incidents. For each Escalation Contact indicate if they must be added to the support case, phoned or both. * ***First Escalation Contact***: [escalationEmailAddress#1] / [PhoneNumber] - Wait XX Minutes before escalating to this contact. * [add Contact to Case / phone] this contact. * ***Second Escalation Contact***: [escalationEmailAddress#2] / [PhoneNumber] - Wait XX Minutes before escalating to this contact. * [add Contact to Case / phone] this contact. * Etc; --- **Communication plans** Describe how Incident Management Engineer communicates with designated stakeholders outside the incident call and communication channels. * **Impact Communication plan** This plan is initiated when Incident Detection and Response have determined from step **Triage** that an alert indicates potential impact to a customer. Incident Detection and Response will request the customer to join the predetermined bridge (Chime Bridge/Customer Provided Bridge / Customer Static Bridge) as indicated in **Engagement plans - Incident call setup**. All backup email templates for use when cases can't be used are in **Engagement plans - Initial engagement**. * 1 – Before sending the impact notification, verify then remove and/or add customer contacts from the Support Case CC based on the contacts listed in the **Initial engagement** Engagement plan. * 2 – Send the engagement notification to the customer based the following Template: (choose one and remove the rest) ***Impact Template - Chime Bridge*** ``` The following alarm has engaged AWS Incident Detection and Response to an Incident bridge: Alarm Identifier - Alarm State Change Reason - Alarm Start Time - Please join the Chime Bridge below so we can start the steps outlined in your Runbook: International dial-in numbers: https://chime.aws/dialinnumbers/ ``` ***Impact Template - Customer Provided Bridge*** ``` The following alarm has engaged AWS Incident Detection and Response: Alarm Identifier - Alarm State Change Reason - Alarm Start Time - Please respond with your internal bridge details so we can join and start the steps outlined in your Runbook. ``` ***Impact Template - Customer Static Bridge*** ``` The following alarm has engaged AWS Incident Detection and Response to an Incident bridge: Alarm Identifier - Alarm State Change Reason - Alarm Start Time - Please join the Bridge below so we can start the steps outlined in your Runbook: Conference Number: Conference URL : ``` * 3 - Set the Case to Pending Customer Action * 4 - Follow **Engagement Escalation** plan as mentioned above. * 5 - If the customer does not respond within 30 minutes, disengage and continue to monitor until the alarm recovers. * **No Impact Communication plan** This plan is initiated when an alarm recovers before Incident Detection and Response have completed initial **Triage**. * 1 - Before sending the no impact notification, verify then remove and/or add customer contacts from the Support Case CC based on the contacts listed in the **Engagement plans - Initial engagement** Engagement plan. * 2 - Send a no engagement notification to the customer based on the below template: ***No Impact Template*** ``` AWS Incident Detection and Response received an alarm that has recovered for your workload. Alarm Identifier - Alarm State Change Reason - Alarm Start Time - Alarm End Time - This may indicate a brief customer impact that is currently not ongoing. If there is an ongoing impact to your workload, please let us know and we will engage to assist. ``` * 3 - Put the case in to Pending Customer Action. * 4 - If the customer does not respond within 30 minutes Resolve the case. * **Updates** If AWS Incident Detection and Response is expected to provide regular updates to customer stakeholders, list those stakeholders here. Updates must be sent via the same support case. Remove this section if not needed. * Update Cadence: Every XX minutes * External Update Stakeholders: customeremailaddress1; customeremailaddress2; etc * Internal Update Stakeholders: awsemailaddress1; awsemailaddress2; etc --- **Application architecture overview** This section provides an overview of the application/workload architecture for Incident Management Engineer and Operations Engineer awareness. * **AWS Accounts and Regions with key services** - list of AWS accounts with regions supporting this application. Assists Engineers in assessing underlying infrastructure supporting the application. * 123456789012 * US-EAST-1 - brief desc as appropriate * EC2 - brief desc as appropriate * DynamoDB - brief desc as appropriate * etc. * US-WEST-1 - brief desc as appropriate * etc. * another-account-etc. * **Resource identification** - describe how engineers determine resource association with application * Resource groups: etc. * Tag key/value: AppId=123456 * **CloudWatch Dashboards** - list dashboards relevant to key metrics and services * 123456789012 * us-east-1 * some-dashboard-name * etc. * some-other-dashboard-name-in-current-acct ## Step: Triage **Evaluate incident and impact** This section provides instructions for triaging of the incident to determine correct impact, description, and overall correct runbook being executed. * **Evaluation of initial incident information** * 1 - Review Incident Alarm, noting time of first detected impact as well as the alarm start time. * 2 - Identify which service(s) in the customer application is seeing impact. * 3 - Review AWS Service Health for services listed under **AWS Accounts and Regions with key services**. * 4 - Review any customer provided dashboards listed under **CloudWatch Dashboards** --- * **Impact** Impact is determined when either the customer's metrics do not recover, appear to be trending worse or if there is indication of AWS Service Impact. * 1 – Start **Communication plans - Impact Communication plan** * 2 - Start **Engagement plans - Engagement Escalation** if no response is received from the **Initial Engagement** contacts. * 3 - Start **Communication plans - Updates** if specified in **Communication plans** * **No Impact** No Impact is determined when the customer's alarm recovers before Triage is complete and there are no indications of AWS service impact or sustained impact on the customer's CloudWatch Dashboards. * 1 - Start **Communication plans - No Impact Communication plan** ## Step: Investigate **Investigation** This section describes performing investigation of known and unknown symptoms. **Known issue** * *List all known issues with the application and their standard actions here* **Unknown issues** * Investigate with the customer and AWS Premium Support. * Escalate internally as required. ## Step: Mitigation **Collaborate** * Communicate any changes or important information from the **Investigate** step to the members of the incident call. **Implement mitigation** * ***List customer failover plans / Disaster Recovery plans / etc here for implementing mitigation. ## Step: Recovery **Monitor customer impact** * Review metrics to confirm recovery. * Ensure recovery is across all Availability Zones / Regions / Services * Get confirmation from the customer that impact is over and the application has recovered. **Identify action items** * Record key decisions and actions taken, including temporary mitigation that might have been implemented. * Ensure outstanding action items have assigned owners. * Close out any Communication plans that were opened during the incident with a final confirmation of recovery notification. ``` # Incident Detection and Response でオンボードしたワークロードをテストする **注記** アラームテストに使用する AWS Identity and Access Management ユーザーまたはロールには `cloudwatch:SetAlarmState` の権限が必要です。オンボーディングプロセスの最後のステップは、新しいワークロードのゲームデーを実行することです。アラームの取り込みが完了すると、AWS Incident Detection and Response は、ゲームデーを開始するために選択した日時を確認します。ゲームデーには主に次の 2 つの目的があります。 + **機能検証:** AWS Incident Detection and Response がアラームイベントを正しく受信できることを確認します。また、機能検証では、アラームイベントが適切なランブックをトリガーし、自動ケース作成など (アラームの取り込み中に選択した場合)、その他の必要なアクションがトリガーされることを確認します。 + **シミュレーション:** ゲームデーは、実際にインシデントが発生した場合に起きる可能性があることをエンドツーエンドでシミュレートします。AWS Incident Detection and Response は、実際のインシデントがどのように展開されるかに関するインサイトを提供するために、規定のランブックのステップに従います。ゲームデーは、エンゲージメントを向上させるために質問したり、指示を改良したりする機会です。アラームテスト中、AWS Incident Detection and Response はお客様と協力して、特定された問題を修正します。 ## CloudWatch アラーム AWS Incident Detection and Response は、アラームの状態の変化をモニタリングすることで、Amazon CloudWatch アラームをテストします。これを行うには、AWS Command Line Interface を使用してアラームを **[Alarm]** 状態に手動で変更します。AWS CLI は AWS CloudShell からアクセスできます。AWS Incident Detection and Response には、テスト中に使用できる AWS CLI コマンドのリストが用意されています。アラームの状態を設定する AWS CLI コマンドの例: ``` aws cloudwatch set-alarm-state --alarm-name "ExampleAlarm" --state-value ALARM --state-reason "Testing AWS Incident Detection and Response" --region us-east-1 ``` CloudWatch アラームの状態を手動で変更する方法の詳細については、「[SetAlarmState](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_SetAlarmState.html)」を参照してください。 CloudWatch API オペレーションに必要なアクセス許可の詳細については、「[Amazon CloudWatch の許可リファレンス](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/permissions-reference-cw.html)」を参照してください。 ## サードパーティーの APM アラーム Datadog、Splunk、New Relic、Dynatrace などのサードパーティのアプリケーションパフォーマンスモニタリング (APM) ツールを利用するワークロードでは、アラームをシミュレートするためのさまざまな手順が必要です。ゲームデーの開始時に、AWS Incident Detection and Response は、アラームのしきい値または比較演算子を一時的に変更して、アラームを **[ALARM]** ステータスに強制するようリクエストします。このステータスは、AWS Incident Detection and Response へのペイロードをトリガーします。 ## 重要なアウトプット重要なアウトプット: + アラームの取り込みが成功し、アラームも正しく設定されています。 + アラームは AWS Incident Detection and Response によって正常に作成され、受信されます。 + サポートケースがエンゲージメント用に作成され、所定の連絡先に通知されます。 + AWS Incident Detection and Response は、所定の会議手段で利用できます。 + ゲームデーの一部として生成されたすべてのアラームとサポートケースが解決されます。 + 本番稼働 E メールは、ワークロードが AWS Incident Detection and Response によってモニタリングされていることを確認するために送信されます。 # Incident Detection and Response でオンボードしたワークロードへの変更をリクエストするオンボーディングされたワークロードの変更をリクエストするには、次の手順を実行して、AWS Incident Detection and Response でサポートケースを作成します。 1. 次の例に示すように、[AWS サポートセンター](https://console.aws.amazon.com/support/home#/)に移動し、**[ケースの作成]** を選択します。 ![\[AWS サポートセンターの例。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/workload-change-request1.png) 1. **[技術]** を選択します。 1. **[サービス]** で、**[Incident Detection and Response]** を選択します。 1. **[カテゴリ]** で、**[ワークロードの変更リクエスト]** を選択します。 1. **[重要度]** で、**[一般的なガイダンス]** を選択します。 1. この変更の **[件名]** を入力します。例えば、次のようになります。 AWS Incident Detection and Response - *workload\$1name* 1. この変更の **[説明]** を入力します。例えば、「このリクエストは、AWS Incident Detection and Response にオンボーディングされた既存のワークロードを変更するためのものです」と入力します。リクエストには、次の情報が含まれていることを確認してください。 + **ワークロード名:** ワークロードの名前。 + **アカウント ID:** ID1、ID2、ID3 など。 + **変更の詳細:** リクエストした変更の詳細を入力します。 1. **[追加の連絡先 - オプション]** セクションに、この変更に関する連絡を受け取る E メール ID を入力します。次に示すのは、**[追加の連絡先 - オプション]** セクションの例です。 ![\[強調表示されている [追加の連絡先 - オプション] セクションに連絡先を入力します。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/workload-change-request2.png) **重要** **[追加の連絡先 - オプション]** セクションに E メール ID を追加しなかった場合、変更プロセスが遅れる可能性があります。 1. [**Submit**] を選択してください。変更リクエストを送信したら、組織から E メールを追加することができます。E メールを追加するには、次の例に示すように、**[ケースの詳細]** で **[返信]** を選択します。 ![\[返信ボタンが強調表示されている詳細ページ。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/workload-change-request3.png) 次に、**[追加の連絡先 - オプション]** セクションで、E メール ID を追加します。以下は、追加の E メールを入力できる場所を示す **[返信]** ページの例です。 ![\[追加の E メールを追加できる [返信] ページ。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/workload-change-request4.png) # Incident Detection and Response との連動によるアラームの抑制オンボードされたワークロードアラームのうち、AWS Incident Detection and Response モニタリングと連動するものを指定し、一時的またはスケジュールに従って抑制します。例えば、計画的なメンテナンス中にワークロードアラームを一時的に抑制して、アラームが Incident Detection and Response と連動しないようにすることができます。または、毎日再起動アクティビティがある場合は、スケジュールに従ってアラームを抑制することもできます。Amazon CloudWatch などのアラームソースでアラームを抑制したり、ワークロード変更リクエストを送信したりできます。 **Topics** + [ # アラームソースでアラームを抑制 ](suppress-alarms-at-source.md) + [ # ワークロード変更リクエストを送信してアラームを抑制 ](suppress-alarms-at-source-wcr.md) + [ # チュートリアル: Metric Math 関数を使用してアラームを抑制 ](suppress-alarms-tutorial-suppress.md) + [ # チュートリアル: Metric Math 関数を削除してアラーム抑制を解除 ](suppress-alarms-tutorial-unsuppress.md) # アラームソースでアラームを抑制アラームソースでアラームを抑制することで、Incident Detection and Response に連動するアラームと、連動するタイミングを指定します。 **Topics** + [ ## Metric Math 関数を使用して CloudWatch アラームを抑制 ](#suppress-alarms-at-source-cw) + [ ## Metric Math 関数を削除して CloudWatch アラーム抑制を解除 ](#suppress-alarms-metric-math-unsuppress) + [ ## Metric Math 関数と関連するユースケースの例 ](#suppress-alarms-example-functions) + [ ## サードパーティー APM からのアラームを抑制 ](#suppress-alarms-third-party-apm) ## Metric Math 関数を使用して CloudWatch アラームを抑制 Amazon CloudWatch アラームの Incident Detection and Response モニタリングを抑制するには、[Metric Math 関数](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html)を使用して、指定されたウィンドウ中に CloudWatch アラームが `ALARM` 状態に入らないようにします。 **注記** CloudWatch のアラームで **[アラームアクション]** を無効にしても、Incident Detection and Response によるアラームのモニタリングは抑制されません。アラーム状態の変更は、CloudWatch のアラームアクションではなく Amazon EventBridge を介して取り込まれます。 Metric Math 関数を使用して CloudWatch アラームを抑制するには、次の手順を実行します。 1. AWS マネジメントコンソールにサインインして、CloudWatch コンソール ([https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/)) を開きます。 1. **[アラーム]** を選択し、Metric Math 関数を追加するアラームを見つけます。 1. **[アクション]** を選択してから、**[編集]** を選択して、アラームを変更します。 1. **[メトリクスを編集]** を選択して、アラームのメトリクスを変更します。 1. **[数式の追加]**、**[空の式から開始]** の順に選択します。 1. 数式を入力し、**[適用]** を選択します。 1. アラームがモニタリングした既存のメトリクスの選択を解除します。 1. 先ほど作成した式を選択し、その後 **[メトリクスの選択]** を選択します。 1. **[プレビューと作成にスキップ]** を選択します。 1. 変更内容を確認して、Metric Math 関数が期待どおりに適用されていることを確認し、**[アラームの更新]** を選択します。 Metric Math 関数を使用して CloudWatch アラームを抑制するステップバイステップの例については、「[チュートリアル: Metric Math 関数を使用してアラームを抑制](suppress-alarms-tutorial-suppress.md)」を参照してください。構文と利用可能な関数の詳細については、「*Amazon CloudWatch ユーザーガイド*」の「[Metric Math 構文と関数](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html#metric-math-syntax)」を参照してください。 ## Metric Math 関数を削除して CloudWatch アラーム抑制を解除 Metric Math 関数を削除して CloudWatch アラームの抑制を解除します。アラームから Metric Math 関数を削除するには、次の手順を実行します。 1. AWS マネジメントコンソールにサインインして、CloudWatch コンソール ([https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/)) を開きます。 1. **[アラーム]** を選択し、メトリクス数式を削除するアラームを見つけます。 1. Metric Math セクションで、**[編集]** を選択します。 1. アラームからメトリクスを削除するには、メトリクスの **[編集]** を選択し、メトリクス数式の横にある **[x]** ボタンを選択します。 1. 元のメトリクスを選択し、**[メトリクスの選択]** を選択します。 1. **[プレビューと作成にスキップ]** を選択します。 1. 変更内容を確認して、Metric Math 関数が期待どおりに適用されていることを確認し、**[アラームの更新]** を選択します。 ## Metric Math 関数と関連するユースケースの例次の表は、Metric Math 関数の例と、関連するユースケース、各メトリクスコンポーネントの説明を示しています。 | Metric Math 関数 | ユースケース | 説明 | | --- | --- | --- | | `IF((DAY(m1) == 2 && HOUR(m1) >= 1 && HOUR(m1) < 3), 0, m1)` | 毎週火曜日の午前 1 時から午前 3 時 (UTC) までの期間中、実際のデータポイントを 0 に置き換えることでアラームを抑制します。 | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/suppress-alarms-at-source.html) | | `IF((HOUR(m1) >= 23 \|\| HOUR(m1) < 4), 0, m1)` | 毎日午後 11 時から午前 4 時 (UTC) までの期間中、実際のデータポイントを 0 に置き換えることでアラームを抑制します。 | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/suppress-alarms-at-source.html) | | `IF((HOUR(m1) >= 11 && HOUR(m1) < 13), 0, m1) ` | 毎日午前 11 時から午後 1 時 (UTC) までの期間中、実際のデータポイントを 0 に置き換えることでアラームを抑制します。 | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/suppress-alarms-at-source.html) | | `IF((DAY(m1) == 2 && HOUR(m1) >= 1 && HOUR(m1) < 3), 99, m1)` | 毎週火曜日の午前 1 時から午前 3 時 (UTC) までの期間中、実際のデータポイントを 99 に置き換えることでアラームを抑制します。 | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/suppress-alarms-at-source.html) | | `IF((HOUR(m1) >= 23 \|\| HOUR(m1) < 4), 100, m1)` | 毎日午後 11 時から午前 4 時 (UTC) までの期間中、実際のデータポイントを 100 に置き換えることでアラームを抑制します。 | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/suppress-alarms-at-source.html) | | `IF((HOUR(m1) >= 11 && HOUR(m1) < 13), 99, m1) ` | 毎日午前 11 時から午後 1 時 (UTC) までの期間中、実際のデータポイントを 99 に置き換えることでアラームを抑制します。 | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/suppress-alarms-at-source.html) | ## サードパーティー APM からのアラームを抑制アラームを抑制する方法については、サードパーティーの APM ベンダーのドキュメントを参照してください。サードパーティーの APM ベンダーの例としては、New Relic、Splunk、Dynatrace、Datadog、SumoLogic などがあります。 # ワークロード変更リクエストを送信してアラームを抑制前のセクションで説明したようにソースでアラームを抑制できない場合は、ワークロード変更リクエストを送信して、ワークロードのアラームの一部またはすべてのモニタリングを手動で抑制するように Incident Detection and Response に指示します。ワークロード変更リクエストの作成方法の詳細については、「[Incident Detection and Response でオンボードしたワークロードへの変更をリクエストする](https://docs.aws.amazon.com/IDR/latest/userguide/idr-workloads-change-request.html)」を参照してください。ワークロード変更リクエストを発行してアラームの抑制をリクエストするときは、次の必須情報を必ず提供してください。 + **ワークロード名:** ワークロードの名前。 + **アカウント ID:** ID1、ID2、ID3 など。 + **変更の詳細:** アラームの抑制 + **抑制開始時刻:** 日付、時刻、タイムゾーン。 + **抑制終了時刻:** 日付、時刻、タイムゾーン。 + **抑制するアラーム:** 抑制する CloudWatch アラーム ARN またはサードパーティー APM イベント識別子のリスト。アラーム抑制ワークロード変更リクエストを作成すると、Incident Detection and Response から次の通知を受け取ります。 + ワークロード変更リクエストの確認。 + アラームが抑制されたときの通知。 + モニタリングのためにアラームが再び有効になったときの通知。 # チュートリアル: Metric Math 関数を使用してアラームを抑制次のチュートリアルでは、Metric Math を使用して CloudWatch アラームを抑制する方法について説明します。 **シナリオの例** 次の火曜日の午前 1 時から午前 3 時 (UTC) までの間に予定されているアクティビティがあります。この時間帯の実際のデータポイントを 0 (設定されたしきい値を下回るデータポイント) に置き換える CloudWatch Metric Math 関数を作成します。 1. アラームをトリガーする基準を評価します。次のスクリーンショットは、アラーム基準の例を示しています。 ![\[アラームの詳細を示す CloudWatch 画面。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/metric-math-assess-alarm-criteria.png) 前のスクリーンショットに示したアラームは、Application Load Balancer ターゲットグループの `UnHealthyHostCount` メトリクスをモニタリングします。このアラームは、5 つのデータポイントのうち 5 つについて、`UnHealthyHostCount` メトリクスが 3 以上になると `ALARM` 状態になります。アラームは、欠落しているデータを不良 (設定されたしきい値に違反している) として扱います。 1. Metric Math 関数を作成します。この例では、予定されているアクティビティは、次の火曜日の午前 1 時から午前 3 時 (UTC) までの間に行われます。したがって、この時間帯の実際のデータポイントを 0 (設定されたしきい値を下回るデータポイント) に置き換える CloudWatch Metric Math 関数を作成します。設定する必要がある置換データポイントは、アラーム設定によって異なります。例えば、HTTP 成功率をモニタリングするアラームでしきい値が 98 未満の場合は、計画されたアクティビティ中の実際のデータポイントを、設定されたしきい値である 100 を超える値に置き換えます。このシナリオの Metric Math 関数の例を次に示します。 ``` IF((DAY(m1) == 2 && HOUR(m1) >= 1 && HOUR(m1) < 3), 0, m1) ``` 上述の Metric Math 関数には、次の要素が含まれています。 + **DAY(m1) == 2**: 火曜日 (月曜日 = 1、日曜日 = 7) であることを確認します。 + **HOUR(m1) >= 1 && HOUR(m1) < 3**: 午前 1 時から午前 3 時 (UTC) の時間範囲を指定します。 + **IF(condition, value\$1if\$1true, value\$1if\$1false)**: 条件が true の場合、関数はメトリクス値を 0 に置き換えます。それ以外の場合は、元の値 (m1) が返されます。構文と利用可能な関数の詳細については、「*Amazon CloudWatch ユーザーガイド*」の「[Metric Math 構文と関数](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/using-metric-math.html#metric-math-syntax)」を参照してください。 1. AWS マネジメントコンソールにサインインして、CloudWatch コンソール ([https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/)) を開きます。 1. **[アラーム]** を選択し、Metric Math 関数を追加するアラームを見つけます。 1. Metric Math セクションで、**[編集]** を選択します。 1. **[数式の追加]**、**[空の式から開始]** の順に選択します。 1. 数式を入力し、**[適用]** を選択します。次の例に示すように、アラームがモニタリングする既存のメトリクスは自動的に **[m1]** になり、数式は **[e1]** になります。 ![\[メトリクス数式を示す CloudWatch 画面。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/metric-math-expression.png) 1. (オプション) 次の例に示すように、メトリクス数式のラベルを編集して、その機能と作成理由を他のユーザーが理解できるようにします。 ![\[メトリクス数式ラベルの編集を示す CloudWatch 画面。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/metric-math-edit-label.png) 1. **[m1]** の選択を解除し、**[e1]** を選択してから、**[メトリクスの選択]** を選択します。これにより、基になるメトリクスを直接モニタリングする代わりに、数式をモニタリングするようにアラームが設定されます。 1. **[プレビューと作成にスキップ]** を選択します。 1. アラームが想定どおりに設定されていることを検証し、**[アラームを更新して変更を保存]** を選択します。前の例では、Metric Math 関数が適用されていなければ、実際の `UnHealthyHostCount` メトリクスは計画されたアクティビティ中に報告されていたはずです。この結果、次の例に示すように、CloudWatch アラームが `ALARM` 状態になり、Incident Detection and Response が連動します。 ![\[アラーム状態につながるデータポイントを示す CloudWatch 画面。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/metric-math-example-alarm-state.png) Metric Math 関数を使用すると、実際のデータポイントがアクティビティ中は 0 に置き換えられ、アラームは `OK` 状態のままになり、Incident Detection and Response エンゲージメントの連動が抑制されます。 ![\[アラーム状態のないデータポイントを示す CloudWatch 画面。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/metric-math-datapoints-no-alarm.png) # チュートリアル: Metric Math 関数を削除してアラーム抑制を解除 1 回限りのアクティビティに対して CloudWatch アラームを抑制する場合は、アクティビティの完了後にアラームから Metric Math 関数を削除して、アラームの定期的なモニタリングを再開します。例えば、毎週同じ曜日と時刻にインスタンスを再起動するパッチ適用ルーチンがスケジュールされている場合など、定期的なスケジュールでアラームを抑制するには、Metric Math 関数をそのままにしておきます。次のチュートリアルでは、Metric Math 関数を削除して CloudWatch アラームの抑制を解除する方法について説明します。 1. AWS マネジメントコンソールにサインインして、CloudWatch コンソール ([https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/)) を開きます。 1. **[アラーム]** を選択し、Metric Math 関数を追加するアラームを見つけます。 1. Metric Math セクションで、**[編集]** を選択します。 1. アラームから抑制を削除するには、メトリクス数式の横にある **[x]** ボタンを選択します。 ![\[Metric Math 関数を削除するための x ボタンを示す CloudWatch 画面。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/metric-math-unsuppress.png) 1. メトリクスを選択して実際のメトリクスのモニタリングを再開し、**[メトリクスの選択]** を選択します。 ![\[[メトリクスの選択] ボタンを示す CloudWatch 画面。\]](http://docs.aws.amazon.com/ja_jp/IDR/latest/userguide/images/metric-math-unsuppress-2.png) 1. **[プレビューと作成にスキップ]** を選択します。 1. アラームが想定どおりに設定されていることを検証し、**[アラームを更新して変更を保存]** を選択します。 # Incident Detection and Response からのワークロードのオフボード AWS Incident Detection and Response からワークロードをオフボードするには、ワークロードごとに新しいサポートケースを作成します。サポートケースを作成する際は、次の点に注意してください。 + 1 つの AWS アカウントにあるワークロードをオフボードするには、ワークロードのアカウントまたは支払者アカウントからサポートケースを作成します。 + 複数の AWS アカウントにまたがるワークロードをオフボードするには、**[支払者アカウント]** からサポートケースを作成します。サポートケースの本文で、オフボードするすべてのアカウント ID を記載します。 **重要** ワークロードをオフボードするサポートケースを作成するアカウントを間違えると、ワークロードをオフロードするまでに遅延が発生したり、追加情報が要求されたりする場合があります。 **ワークロードをオフボードするリクエスト** 1. [AWS サポートセンター](https://console.aws.amazon.com/support/home#/)に移動し、**[ケースの作成]** を選択します。 1. **[技術]** を選択します。 1. **[サービス]** で、**[Incident Detection and Response]** を選択します。 1. **[カテゴリ]** で、**[ワークロードのオフボーディング]** を選択します。 1. **[重要度]** で、**[一般的なガイダンス]** を選択します。 1. この変更の **[件名]** を入力します。例えば、次のようになります。 [オフボード] AWS Incident Detection and Response - *workload\$1name* 1. この変更の **[説明]** を入力します。例えば、「このリクエストは AWS インシデント検出とレスポンスにオンボードされた既存のワークロードをオフボーディングするためのものです」と入力します。リクエストには、次の情報が含まれていることを確認してください。 + **ワークロード名:** ワークロードの名前。 + **アカウント ID:** ID1、ID2、ID3 など。 + **オフボーディングの理由:** ワークロードをオフボーディングする理由を入力します。 1. **[追加の連絡先 - オプション]** セクションに、このオフボーディングのリクエストに関する連絡を受け取る E メール ID を入力します。 1. [**Submit**] を選択します。