

# Running sensitive data discovery jobs
Running sensitive data discovery jobs

With Amazon Macie, you can create and run sensitive data discovery jobs to automate discovery, logging, and reporting of sensitive data in Amazon Simple Storage Service (Amazon S3) general purpose buckets. A *sensitive data discovery job* is a series of automated processing and analysis tasks that Macie performs to detect and report sensitive data in Amazon S3 objects. Each job provides detailed reports of the sensitive data that Macie finds and the analysis that Macie performs. By creating and running jobs, you can build and maintain a comprehensive view of the data that your organization stores in Amazon S3 and any security or compliance risks for that data.

To help you meet and maintain compliance with your data security and privacy requirements, Macie provides several options for scheduling and defining the scope of a job. You can configure a job to run only once for on-demand analysis and assessment, or on a recurring basis for periodic analysis, assessment, and monitoring. You also define the breadth and depth of a job's analysis—specific S3 buckets that you select or buckets that match specific criteria. You can optionally refine the scope of that analysis by choosing additional options. The options include custom criteria that derive from properties of S3 objects, such as tags, prefixes, and when an object was last modified.

For each job, you also specify the types of sensitive data that you want Macie to detect and report. You can configure a job to use [managed data identifiers](managed-data-identifiers.md) that Macie provides, [custom data identifiers](custom-data-identifiers.md) that you define, or a combination of the two. By selecting specific managed and custom data identifiers for a job, you can tailor the analysis to focus on specific types of sensitive data. To fine tune the analysis, you can also configure a job to use [allow lists](allow-lists.md). Allow lists specify text and text patterns that you want Macie to ignore, typically sensitive data exceptions for your organization's particular scenarios or environment.

Each job produces records of the sensitive data that Macie finds and the analysis that Macie performs—*sensitive data findings* and *sensitive data discovery results*. A *sensitive data finding* is a detailed report of sensitive data that Macie found in an S3 object. A *sensitive data discovery result* is a record that logs details about the analysis of an S3 object. Macie creates a sensitive data discovery result for each object that you configure a job to analyze. This includes objects that Macie doesn’t find sensitive data in, and therefore don't produce sensitive data findings, and objects that Macie can't analyze due to errors or issues. Each type of record adheres to a standardized schema, which can help you query, monitor, and process the records to meet your security and compliance requirements.

**Topics**
+ [Scope options for jobs](discovery-jobs-scope.md)
+ [Creating a job](discovery-jobs-create.md)
+ [Reviewing job results](discovery-jobs-manage-results.md)
+ [Managing jobs](discovery-jobs-manage.md)
+ [Monitoring jobs with CloudWatch Logs](discovery-jobs-monitor-cw-logs.md)
+ [Forecasting and monitoring job costs](discovery-jobs-costs.md)
+ [Managed data identifiers recommended for jobs](discovery-jobs-mdis-recommended.md)

# Scope options for sensitive data discovery jobs
Scope options for jobs

With sensitive data discovery jobs, you define the scope of the analysis that Amazon Macie performs to detect and report sensitive data in your Amazon Simple Storage Service (Amazon S3) general purpose buckets. To help you do this, Macie provides several job-specific options that you can choose when you create and configure a job.

**Topics**
+ [

## S3 buckets or bucket criteria
](#discovery-jobs-scope-buckets)
+ [

## Sampling depth
](#discovery-jobs-scope-sampling)
+ [

## Initial run: Include existing S3 objects
](#discovery-jobs-scope-objects)
+ [

## S3 object criteria
](#discovery-jobs-scope-criteria)

## S3 buckets or bucket criteria
S3 buckets or bucket criteria

When you create a sensitive data discovery job, you specify which S3 buckets store objects that you want Macie to analyze when the job runs. You can do this in two ways: by selecting specific S3 buckets from your bucket inventory, or by specifying custom criteria that derive from properties of S3 buckets.

**Select specific S3 buckets**  
With this option, you explicitly select each S3 bucket to analyze. Then, when the job runs, Macie analyzes objects only in the buckets that you select. If you configure a job to run periodically on a daily, weekly, or monthly basis, Macie analyzes objects in those same buckets each time the job runs.   
This configuration is helpful for cases where you want to perform targeted analysis of a specific set of data. It gives you precise, predictable control over which buckets a job analyzes.

**Specify S3 bucket criteria**  
With this option, you define runtime criteria that determine which S3 buckets to analyze. The criteria consist of one or more conditions that derive from bucket properties, such as public access settings and tags. When the job runs, Macie identifies buckets that match your criteria, and then analyzes objects in those buckets. If you configure a job to run periodically, Macie does this each time the job runs. Consequently, Macie might analyze objects in different buckets each time the job runs, depending on changes to your bucket inventory and the criteria that you define.  
This configuration is helpful for cases where you want the scope of the analysis to dynamically adapt to changes to your bucket inventory. If you configure a job to use bucket criteria and run periodically, Macie automatically identifies new buckets that match the criteria and inspects those buckets for sensitive data.

The topics in this section provide additional details about each option.

**Topics**
+ [Selecting specific S3 buckets](#discovery-jobs-scope-buckets-select)
+ [Specifying S3 bucket criteria](#discovery-jobs-scope-buckets-criteria)

### Selecting specific S3 buckets
Selecting specific S3 buckets

If you choose to explicitly select each S3 bucket that you want a job to analyze, Macie provides you with an inventory of your general purpose buckets in the current AWS Region. You can then review your inventory and select the buckets that you want. If you're the Macie administrator for an organization, your inventory includes buckets that your member accounts own. You can select as many as 1,000 of these buckets, spanning as many as 1,000 accounts.

To help you make your bucket selections, the inventory provides details and statistics for each bucket. This includes the amount of data that a job can analyze in each bucket—*classifiable objects* are objects that use a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes) and have a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats). The inventory also indicates whether you configured any existing jobs to analyze objects in a bucket. These details can help you estimate the breadth of a job and refine your bucket selections.

In the inventory table:
+ **Sensitivity** – Specifies the bucket's current sensitivity score, if [automated sensitive data discovery](discovery-asdd.md) is enabled.
+ **Classifiable objects** – Specifies the total number of objects that the job can analyze in the bucket.
+ **Classifiable size** – Specifies the total storage size of all the objects that the job can analyze in the bucket.

  If the bucket stores compressed objects, this value doesn’t reflect the actual size of those objects after they're decompressed. If versioning is enabled for the bucket, this value is based on the storage size of the latest version of each object in the bucket.
+ **Monitored by job** – Specifies whether you configured any existing jobs to periodically analyze objects in the bucket on a daily, weekly, or monthly basis.

  If the value for this field is **Yes**, the bucket is explicitly included in a periodic job or the bucket matched the criteria for a periodic job within the past 24 hours. In addition, the status of at least one of those jobs is not *Cancelled*. Macie updates this data on a daily basis.
+ **Latest job run** – If you configured any periodic or one-time jobs to analyze objects in the bucket, this field specifies the most recent date and time when one of those jobs started to run. Otherwise, a dash (–) appears in this field.

If the information icon (![\[The information icon, which is a blue circle that has a lowercase letter i in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-info-blue.png)) appears next to any bucket names, we recommend that you retrieve the latest bucket metadata from Amazon S3. To do this, choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) above the table. The information icon indicates that a bucket was created during the past 24 hours, possibly after Macie last retrieved bucket and object metadata from Amazon S3 as part of the daily refresh cycle. For more information, see [Data refreshes](monitoring-s3-how-it-works.md#monitoring-s3-how-it-works-data-refresh).

If the warning icon (![\[The warning icon, which is a red triangle that has an exclamation point in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-warning-red.png)) appears next to a bucket's name, Macie isn't allowed to access the bucket or the bucket's objects. This means that the job won't be able to analyze objects in the bucket. To investigate the issue, review the bucket’s policy and permissions settings in Amazon S3. For example, the bucket might have a restrictive bucket policy. For more information, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).

To customize your view and find specific buckets more easily, you can filter the table by entering filter criteria in the filter box. The following table provides some examples.


| To show all buckets that... | Apply this filter... | 
| --- | --- | 
| Are owned by a specific account | Account ID = the 12-digit ID for the account | 
| Are publicly accessible | Effective permission = Public | 
| Aren't included in any periodic jobs | Actively monitored by job = False | 
| Aren't included in any periodic or one-time jobs | Defined in job = False | 
| Have a specific tag key\$1 | Tag key = the tag key | 
| Have a specific tag value\$1 | Tag value = the tag value | 
| Store unencrypted objects (or objects that use client-side encryption) | Object count by encryption is No encryption and From = 1 | 

\$1 Tag keys and values are case sensitive. Also, you have to specify a complete, valid value. You can’t specify partial values or use wildcard characters.

To display additional details for a bucket, choose the bucket's name and refer to the details panel. In the panel, you can also:
+ Pivot and drill down on certain fields by choosing a magnifying glass for the field. Choose ![\[The zoom in icon, which is a magnifying glass that has a plus sign in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-magnifying-glass-plus-sign.png) to show buckets with the same value. Choose ![\[The zoom out icon, which is a magnifying glass that has a minus sign in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-magnifying-glass-minus-sign.png) to show buckets with other values.
+ Retrieve the latest metadata for objects in the bucket. This can be helpful if you recently created a bucket or made significant changes to the bucket's objects during the past 24 hours. To retrieve the data, choose refresh (![\[The refresh button, which is a button that displays an empty, dark gray circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-object-data.png)) in the **Object statistics** section of the panel. This option is available for buckets that store 30,000 or fewer objects.

In certain cases, the panel might not include all the details of a bucket. This can occur if you store more than 10,000 buckets in Amazon S3. Macie maintains complete inventory data for only 10,000 buckets for an account—the 10,000 buckets that were most recently created or changed. You can, however, configure a job to analyze objects in buckets that exceed this quota. To review additional details for these buckets, use Amazon S3.

### Specifying S3 bucket criteria
Specifying S3 bucket criteria

If you choose to specify bucket criteria for a job, Macie provides options for defining and testing the criteria. These are runtime criteria that determine which S3 buckets store objects to analyze. Each time the job runs, Macie identifies general purpose buckets that match your criteria, and then analyzes objects in the appropriate buckets. If you're the Macie administrator for an organization, this includes buckets that your member accounts own. 

#### Defining bucket criteria


Bucket criteria consist of one or more conditions that derive from properties of S3 buckets. Each condition, also referred to as a *criterion*, consists of the following parts:
+ A property-based field, such as **Account ID** or **Effective permission**.
+ An operator, either *equals* (`eq`) or *not equals* (`neq`).
+ One or more values.
+ An include or exclude statement that indicates whether to analyze (*include*) or skip (*exclude*) buckets that match the condition.

If you specify more than one value for a field, Macie uses OR logic to join the values. If you specify more than one condition for the criteria, Macie uses AND logic to join the conditions. In addition, exclude conditions take precedence over include conditions. For example, if you include buckets that are publicly accessible and exclude buckets that have specific tags, the job analyzes objects in any bucket that's publicly accessible unless the bucket has one of the specified tags.

You can define conditions that derive from any of the following property-based fields for S3 buckets.

**Account ID**   
The unique identifier (ID) for the AWS account that owns a bucket. To specify multiple values for this field, enter the ID for each account and separate each entry with a comma.  
Note that Macie doesn't support use of wildcard characters or partial values for this field.

**Bucket name**  
The name of a bucket. This field correlates to the **Name** field, not the **Amazon Resource Name (ARN)** field, in Amazon S3. To specify multiple values for this field, enter the name of each bucket and separate each entry with a comma.  
Note that values are case sensitive. In addition, Macie doesn't support use of wildcard characters or partial values for this field. 

**Effective permission**  
Specifies whether a bucket is publicly accessible. You can choose one or more of the following values for this field:  
+ **Not public** – The general public doesn't have read or write access to the bucket.
+ **Public** – The general public has read or write access to the bucket.
+ **Unknown** – Macie wasn't able to evaluate the public access settings for the bucket. An issue or quota prevented Macie from retrieving and evaluating the requisite data.
To determine whether a bucket is publicly accessible, Macie analyzes a combination of account- and bucket-level settings for the bucket: the block public access settings for the account; the block public access settings for the bucket; the bucket policy for the bucket; and, the access control list (ACL) for the bucket. For information about these settings, see [Access control](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-management.html) and [Blocking public access to your Amazon S3 storage](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-control-block-public-access.html) in the *Amazon Simple Storage Service User Guide*.

**Shared access**  
Specifies whether a bucket is shared with another AWS account, an Amazon CloudFront origin access identity (OAI), or a CloudFront origin access control (OAC). You can choose one or more of the following values for this field:  
+ **External** – The bucket is shared with one or more of the following or any combination of the following: a CloudFront OAI, a CloudFront OAC, or an account that's external to (not part of) your organization.
+ **Internal** – The bucket is shared with one or more accounts that are internal to (part of) your organization. It isn't shared with a CloudFront OAI or OAC.
+ **Not shared** – The bucket isn't shared with another account, a CloudFront OAI, or a CloudFront OAC.
+ **Unknown** – Macie wasn't able to evaluate the shared access settings for the bucket. An issue or quota prevented Macie from retrieving and evaluating the requisite data.
To determine whether a bucket is shared with another AWS account, Macie analyzes the bucket policy and ACL for the bucket. In addition, an *organization* is defined as a set of Macie accounts that are centrally managed as a group of related accounts through AWS Organizations or by Macie invitation. For information about Amazon S3 options for sharing buckets, see [Access control](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-management.html) in the *Amazon Simple Storage Service User Guide*.  
To determine whether a bucket is shared with a CloudFront OAI or OAC, Macie analyzes the bucket policy for the bucket. A CloudFront OAI or OAC allows users to access a bucket's objects through one or more specified CloudFront distributions. For information about CloudFront OAIs and OACs, see [Restricting access to an Amazon S3 origin](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/private-content-restricting-access-to-s3.html) in the *Amazon CloudFront Developer Guide*.

**Tags**  
The tags that are associated with a bucket. Tags are labels that you can define and assign to certain types of AWS resources, including S3 buckets. Each tag consists of a required tag key and an optional tag value. For information about tagging S3 buckets, see [Using cost allocation S3 bucket tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/CostAllocTagging.html) in the *Amazon Simple Storage Service User Guide*.  
For a sensitive data discovery job, you can use this type of condition to include or exclude buckets that have a specific tag key, a specific tag value, or a specific tag key and tag value (as a pair). For example:  
+ If you specify **Project** as a tag key and don't specify any tag values for a condition, any bucket that has the *Project* tag key matches the condition’s criteria, regardless of the tag values that are associated with that tag key.
+ If you specify **Development** and **Test** as tag values and don't specify any tag keys for a condition, any bucket that has the **Development** or **Test** tag value matches the condition’s criteria, regardless of the tag keys that are associated with those tag values.
Tag keys and values are case sensitive. In addition, Macie doesn't support use of wildcard characters or partial values in tag conditions.  
To specify multiple tag keys in a condition, enter each tag key in the **Key** field and separate each entry with a comma. To specify multiple tag values in a condition, enter each tag value in the **Value** field and separate each entry with a comma.  
If you store more than 10,000 buckets in Amazon S3, note that Macie doesn't maintain tag data for all the buckets. Macie maintains complete inventory data for only 10,000 buckets for an account—the 10,000 buckets that were most recently created or changed. For all other buckets, any associated tag keys and values aren't included in inventory data. This means that the buckets won't match specific tag keys or values in a condition that uses the *equals* (`eq`) operator. If you specify a *not equals* (`neq`) operator for a tag-based condition, this means that the buckets will match the condition.

#### Testing bucket criteria


While you define your bucket criteria, you can test and refine the criteria by previewing the results. To do this, expand the **Preview the criteria results** section that appears below the criteria on the console. This section displays a table of up to 25 general purpose buckets that currently match the criteria.

The table also provides insight into the amount of data that the job can analyze in each bucket—*classifiable objects* are objects that use a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes) and have a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats). The table also indicates whether you configured any existing jobs to periodically analyze objects in a bucket.

In the table:
+ **Sensitivity** – Specifies the bucket's current sensitivity score, if [automated sensitive data discovery](discovery-asdd.md) is enabled.
+ **Classifiable objects** – Specifies the total number of objects that the job can analyze in the bucket.
+ **Classifiable size** – Specifies the total storage size of all the objects that the job can analyze in the bucket.

  If the bucket stores compressed objects, this value doesn’t reflect the actual size of those objects after they're decompressed. If versioning is enabled for the bucket, this value is based on the storage size of the latest version of each object in the bucket.
+ **Monitored by job** – Specifies whether you configured any existing jobs to periodically analyze objects in the bucket on a daily, weekly, or monthly basis.

  If the value for this field is **Yes**, the bucket is explicitly included in a periodic job or the bucket matched the criteria for a periodic job within the past 24 hours. In addition, the status of at least one of those jobs is not *Cancelled*. Macie updates this data on a daily basis.

If the warning icon (![\[The warning icon, which is a red triangle that has an exclamation point in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-warning-red.png)) appears next to a bucket's name, Macie isn't allowed to access the bucket or the bucket's objects. This means that the job won't be able to analyze objects in the bucket. To investigate the issue, review the bucket’s policy and permissions settings in Amazon S3. For example, the bucket might have a restrictive bucket policy. For more information, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).

To refine the bucket criteria for the job, use the filter options to add, change, or remove conditions from the criteria. Macie then updates the table to reflect your changes.

## Sampling depth
Sampling depth

With this option, you specify the percentage of eligible S3 objects that you want a sensitive data discovery job to analyze. Eligible objects are objects that: use a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes), have a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats), and match other criteria that you specify for the job.

If this value is less than 100%, Macie selects eligible objects to analyze at random, up to the specified percentage, and analyzes all the data in those objects. For example, if you configure a job to analyze 10,000 objects and you specify a sampling depth of 20%, Macie analyzes approximately 2,000 randomly selected, eligible objects when the job runs.

Reducing the sampling depth of a job can lower the cost and reduce the duration of a job. It's helpful for cases where the data in objects is highly consistent and you want to determine whether an S3 bucket, rather than each object, stores sensitive data.

Note that this option controls the percentage of *objects* that are analyzed, not the percentage of *bytes* that are analyzed. If you enter a sampling depth that’s less than 100%, Macie analyzes all the data in each selected object, not that percentage of the data in each selected object.

## Initial run: Include existing S3 objects
Initial run: Include existing S3 objects

You can use sensitive data discovery jobs to perform ongoing, incremental analysis of objects in S3 buckets. If you configure a job to run periodically, Macie does this for you automatically—each run analyzes only those objects that were created or changed after the preceding run. With the **Include existing objects** option, you choose the starting point for the first increment:
+ To analyze all existing objects immediately after you finish creating the job, select the checkbox for this option.
+ To wait and analyze only those objects that are created or changed after you create the job and before the first run, clear the checkbox for this option.

  Clearing this checkbox is helpful for cases where you already analyzed the data and want to continue to analyze it periodically. For example, if you previously used another service or application to classify data and you recently started using Macie, you might use this option to ensure continued discovery and classification of your data without incurring unnecessary costs or duplicating classification data.

Each subsequent run of a periodic job automatically analyzes only those objects that are created or changed after the preceding run.

For both periodic and one-time jobs, you can also configure a job to analyze only those objects that are created or changed before or after a certain time or during a certain time range. To do this, add object criteria that use the last modified date for objects.

## S3 object criteria
S3 object criteria

To fine tune the scope of a sensitive data discovery job, you can define custom criteria for S3 objects. Macie uses these criteria to determine which objects to analyze (*include*) or skip (*exclude*) when the job runs. The criteria consist of one or more conditions that derive from properties of S3 objects. The conditions apply to objects in all the S3 buckets that are included in the analysis. If a bucket stores multiple versions of an object, the conditions apply to the latest version of the object.

If you define multiple conditions as object criteria, Macie uses AND logic to join the conditions. In addition, exclude conditions take precedence over include conditions. For example, if you include objects that have the .pdf file name extension and exclude objects that are larger than 5 MB, the job analyzes any object that has the .pdf file name extension, unless the object is larger than 5 MB.

You can define conditions that derive from any of the following properties of S3 objects.

**File name extension**  
This correlates to the file name extension of an S3 object. You can use this type of condition to include or exclude objects based on file type. To do this for multiple types of files, enter the file name extension for each type and separate each entry with a comma—for example: **docx,pdf,xlsx**. If you enter multiple file name extensions as values for a condition, Macie uses OR logic to join the values.  
Note that values are case sensitive. In addition, Macie doesn't support the use of partial values or wildcard characters in this type of condition.  
For information about the types of files that Macie can analyze, see [Supported file and storage formats](discovery-supported-storage.md#discovery-supported-formats).

**Last modified**  
This correlates to the **Last modified** field in Amazon S3. In Amazon S3, this field stores the date and time when an S3 object was created or last changed, whichever is latest.  
For a sensitive data discovery job, this condition can be a specific date, a specific date and time, or an exclusive time range:  
+ To analyze objects that were last modified after a certain date or date and time, enter the values in the **From** fields.
+ To analyze objects that were last modified before a certain date or date and time, enter the values in the **To** fields.
+ To analyze objects that were last modified during a certain time range, use the **From** fields to enter the values for the first date or date and time in the time range. Use the **To** fields to enter the values for the last date or date and time in the time range.
+ To analyze objects that were last modified at any time during a certain single day, enter the date in the **From** date field. Enter the date for the next day in the **To** date field. Then verify that both time fields are blank. (Macie treats a blank time field as `00:00:00`.) For example, to analyze objects that changed on August 9, 2023, enter **2023/08/09** in the **From** date field, enter **2023/08/10** in the **To** date field, and don't enter a value in either time field.
Enter any time values in Coordinated Universal Time (UTC) and use 24-hour notation.

**Prefix**  
This correlates to the **Key** field in Amazon S3. In Amazon S3, this field stores the name of an S3 object, including the object's prefix. A *prefix* is similar to a directory path within a bucket. It enables you to group similar objects together in a bucket, much like you might store similar files together in a folder on a file system. For information about object prefixes and folders in Amazon S3, see [Organizing objects in the Amazon S3 console using folders](https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-folders.html) in the *Amazon Simple Storage Service User Guide*.  
You can use this type of condition to include or exclude objects whose keys (names) begin with a certain value. For example, to exclude all objects whose key begins with *AWSLogs*, enter **AWSLogs** as the value for a **Prefix** condition, and then choose **Exclude**.   
If you enter multiple prefixes as values for a condition, Macie uses OR logic to join the values. For example, if you enter **AWSLogs1** and **AWSLogs2** as values for a condition, any object whose key begins with *AWSLogs1* or *AWSLogs2* matches the condition’s criteria.  
When you enter a value for a **Prefix** condition, keep the following in mind:  
+ Values are case sensitive.
+ Macie doesn't support the use of wildcard characters in these values.
+ In Amazon S3, an object’s key doesn’t include the name of the bucket that stores the object. For this reason, don’t specify bucket names in these values.
+ If a prefix includes a delimiter, include the delimiter in the value. For example, enter **AWSLogs/eventlogs** to define a condition for all objects whose key begins with *AWSLogs/eventlogs*. Macie supports the default Amazon S3 delimiter, which is a slash (/), and custom delimiters.
Also note that an object matches a condition's criteria only if the object's key exactly matches the value that you enter, starting with the first character in the object's key. In addition, Macie applies a condition to the complete **Key** value for an object, including the object's file name.   
For example, if an object's key is *AWSLogs/eventlogs/testlog.csv* and you enter any of the following values for a condition, the object matches the condition's criteria:  
+ **AWSLogs**
+ **AWSLogs/event**
+ **AWSLogs/eventlogs/**
+ **AWSLogs/eventlogs/testlog**
+ **AWSLogs/eventlogs/testlog.csv**
However, if you enter **eventlogs**, the object doesn't match the criteria—the condition's value doesn't include the first part of the key, *AWSLogs/*. Similarly, if you enter **awslogs**, the object doesn't match the criteria due to differences in capitalization.

**Storage size**  
This correlates to the **Size** field in Amazon S3. In Amazon S3, this field indicates the total storage size of an S3 object. If an object is a compressed file, this value doesn't reflect the actual size of the file after it's decompressed.  
You can use this type of condition to include or exclude objects that are smaller than a certain size, larger than a certain size, or fall within a certain size range. Macie applies this type of condition to all types of objects, including compressed or archive files and the files that they contain. For information about size-based restrictions for each supported format, see [Quotas for Macie](macie-quotas.md).

**Tags**  
The tags that are associated with an S3 object. Tags are labels that you can define and assign to certain types of AWS resources, including S3 objects. Each tag consists of a required tag key and an optional tag value. For information about tagging S3 objects, see [Categorizing your storage using tags](https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html) in the *Amazon Simple Storage Service User Guide*.  
For a sensitive data discovery job, you can use this type of condition to include or exclude objects that have a specific tag. This can be a specific tag key or a specific tag key and tag value (as a pair). If you specify multiple tags as values for a condition, Macie uses OR logic to join the values. For example, if you specify **Project1** and **Project2** as tag keys for a condition, any object that has the *Project1* or *Project2* tag key matches the condition’s criteria.  
Note that tag keys and values are case sensitive. In addition, Macie doesn't support use of partial values or wildcard characters in this type of condition.

# Creating a sensitive data discovery job
Creating a job

With Amazon Macie, you can create and run sensitive data discovery jobs to automate discovery, logging, and reporting of sensitive data in Amazon Simple Storage Service (Amazon S3) general purpose buckets. A *sensitive data discovery job* is a series of automated processing and analysis tasks that Macie performs to detect and report sensitive data in Amazon S3 objects. As the analysis progresses, Macie provides detailed reports of the sensitive data that it finds and the analysis that it performs: *sensitive data findings*, which report sensitive data that Macie finds in individual S3 objects, and *sensitive data discovery results*, which log details about the analysis of individual S3 objects. For more information, see [Reviewing job results](discovery-jobs-manage-results.md).

When you create a job, you start by specifying which S3 buckets store objects that you want Macie to analyze when the job runs—specific buckets that you select or buckets that match specific criteria. Then you specify how often to run the job—once, or periodically on a daily, weekly, or monthly basis. You can also choose options to refine the scope of the job's analysis. The options include custom criteria that derive from properties of S3 objects, such as tags, prefixes, and when an object was last modified.

After you define the schedule and scope of the job, you specify which managed data identifiers and custom data identifiers to use: 
+ A *managed data identifier* is a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data—for example, credit card numbers, AWS secret access keys, or passport numbers for a particular country or region. These identifiers can detect a large and growing list of sensitive data types for many countries and regions, including multiple types of credentials data, financial information, and personally identifiable information (PII). For more information, see [Using managed data identifiers](managed-data-identifiers.md).
+ A *custom data identifier* is a set of criteria that you define to detect sensitive data. With custom data identifiers, you can detect sensitive data that reflects your organization's particular scenarios, intellectual property, or proprietary data—for example, employee IDs, customer account numbers, or internal data classifications. You can supplement the managed data identifiers that Macie provides. For more information, see [Building custom data identifiers](custom-data-identifiers.md).

You then optionally select allow lists to use. In Macie, an *allow list* specifies text or a text pattern to ignore. These are typically sensitive data exceptions for your particular scenarios or environment—for example, public names or phone numbers for your organization, or sample data that your organization uses for testing. For more information, see [Defining sensitive data exceptions with allow lists](allow-lists.md).

When you finish choosing these options, you're ready to enter general settings for the job, such as the job's name and description. You can then review and save the job.

**Topics**
+ [

## Before you begin: Set up key resources
](#discovery-jobs-create-prerequisites)
+ [

## Step 1: Choose S3 buckets
](#discovery-jobs-create-step1)
+ [

## Step 2: Review your S3 bucket selections or criteria
](#discovery-jobs-create-step2)
+ [

## Step 3: Define the schedule and refine the scope
](#discovery-jobs-create-step3)
+ [

## Step 4: Select managed data identifiers
](#discovery-jobs-create-step4)
+ [

## Step 5: Select custom data identifiers
](#discovery-jobs-create-step5)
+ [

## Step 6: Select allow lists
](#discovery-jobs-create-step6)
+ [

## Step 7: Enter general settings
](#discovery-jobs-create-step7)
+ [

## Step 8: Review and create
](#discovery-jobs-create-step8)

## Before you begin: Set up key resources


Before you create a job, it's a good idea to take the following steps: 
+ Verify that you configured a repository for your sensitive data discovery results. To do this, choose **Discovery results** in the navigation pane on the Amazon Macie console. To learn about these settings, see [Storing and retaining sensitive data discovery results](discovery-results-repository-s3.md).
+ Create any custom data identifiers that you want the job to use. To learn how, see [Building custom data identifiers](custom-data-identifiers.md).
+ Create any allow lists that you want the job to use. To learn how, see [Defining sensitive data exceptions with allow lists](allow-lists.md).
+ If you want to analyze S3 objects that are encrypted, ensure that Macie can access and use the appropriate encryption keys. For more information, see [Analyzing encrypted S3 objects](discovery-supported-encryption-types.md).
+ If you want to analyze objects in an S3 bucket that has a restrictive bucket policy, ensure that Macie is allowed to access the objects. For more information, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md).

If you do these things before you create a job, you streamline creation of the job and help ensure that the job can analyze the data that you want.

## Step 1: Choose S3 buckets


When you create a job, the first step is to specify which S3 buckets store objects that you want Macie to analyze when the job runs. For this step, you have two options:
+ **Select specific buckets** – With this option, you explicitly select each S3 bucket to analyze. Then, when the job runs, Macie analyzes objects only in the buckets that you select.
+ **Specify bucket criteria** – With this option, you define runtime criteria that determine which S3 buckets to analyze. The criteria consist of one or more conditions that derive from bucket properties. Then, when the job runs, Macie identifies buckets that match your criteria and analyzes objects in those buckets.

For detailed information about these options, see [Scope options for jobs](discovery-jobs-scope.md).

The following sections provide instructions for choosing and configuring each option. Choose the section for the option that you want.

### Select specific buckets


If you choose to explicitly select each S3 bucket to analyze, Macie provides you with an inventory of your general purpose buckets in the current AWS Region. You can then use this inventory to select one or more buckets for the job. To learn about this inventory, see [Selecting specific S3 buckets](discovery-jobs-scope.md#discovery-jobs-scope-buckets-select).

If you're the Macie administrator for an organization, the inventory includes buckets that are owned by member accounts in your organization. You can select as many as 1,000 of these buckets, spanning as many as 1,000 accounts.

**To select specific S3 buckets for the job**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**.

1. Choose **Create job**.

1. On the **Choose S3 buckets** page, choose **Select specific buckets**. Macie displays a table of all the general purpose buckets for your account in the current Region. 

1. In the **Select S3 buckets** section, optionally choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the latest bucket metadata from Amazon S3.

   If the information icon (![\[The information icon, which is a blue circle that has a lowercase letter i in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-info-blue.png)) appears next to any bucket names, we recommend that you do this. This icon indicates that a bucket was created during the past 24 hours, possibly after Macie last retrieved bucket and object metadata from Amazon S3 as part of the [daily refresh cycle](monitoring-s3-how-it-works.md#monitoring-s3-how-it-works-data-refresh).

1. In the table, select the checkbox for each bucket that you want the job to analyze. 
**Tip**  
To find specific buckets more easily, enter filter criteria in the filter box above the table. You can also sort the table by choosing a column heading.
To determine whether you already configured a job to periodically analyze objects in a bucket, refer to the **Monitored by job** field. If **Yes** appears in a field, the bucket is explicitly included in a periodic job or the bucket matched the criteria for a periodic job within the past 24 hours. In addition, the status of at least one of those jobs is not *Cancelled*. Macie updates this data on a daily basis. 
To determine when an existing periodic or one-time job most recently analyzed objects in a bucket, refer to the **Latest job run** field. For additional information about that job, refer to the bucket's details.
To display a bucket's details, choose the bucket's name. In addition to job-related information, the details panel provides statistics and other information about the bucket, such as the bucket's public access settings. To learn more about this data, see [Reviewing your S3 bucket inventory](monitoring-s3-inventory-review.md).

1. When you finish selecting buckets, choose **Next**.

In the next step, you'll review and verify your selections.

### Specify bucket criteria


If you choose to specify runtime criteria that determine which S3 buckets to analyze, Macie provides options to help you choose fields, operators, and values for individual conditions in the criteria. To learn more about these options, see [Specifying S3 bucket criteria](discovery-jobs-scope.md#discovery-jobs-scope-buckets-criteria).

**To specify S3 bucket criteria for the job**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**.

1. Choose **Create job**.

1. On the **Choose S3 buckets** page, choose **Specify bucket criteria**.

1. Under **Specify bucket criteria**, do the following to add a condition to the criteria:

   1. Place your cursor in the filter box, and then choose the bucket property to use for the condition.

   1. In the first box, choose an operator for the condition, **Equals** or **Not equals**.

   1. In the next box, enter one or more values for the property.

      Depending on the type and nature of the bucket property, Macie displays different options for entering values. For example, if you choose the **Effective permission** property, Macie displays a list of values to choose from. If you choose the **Account ID** property, Macie displays a text box in which you can enter one or more AWS account IDs. To enter multiple values in a text box, enter each value and separate each entry with a comma.

   1. Choose **Apply**. Macie adds the condition and displays it below the filter box.

      By default, Macie adds the condition with an include statement. This means that the job is configured to analyze (*include*) objects in buckets that match the condition. To skip (*exclude*) buckets that match the condition, choose **Include** for the condition, and then choose **Exclude**.

   1. Repeat the preceding steps for each additional condition that you want to add to the criteria.

1. To test your criteria, expand the **Preview the criteria results** section. This section displays a table of up to 25 general purpose buckets that currently match the criteria.

1. To refine your criteria, do any of the following: 
   + To remove a condition, choose **X** for the condition.
   + To change a condition, remove the condition by choosing **X** for the condition. Then add a condition that has the correct settings.
   + To remove all conditions, choose **Clear filters**.

   Macie updates the table of criteria results to reflect your changes.

1. When you finish specifying bucket criteria, choose **Next**.

In the next step, you'll review and verify your criteria.

## Step 2: Review your S3 bucket selections or criteria


For this step, verify that you chose the correct settings in the preceding step:
+ **Review your bucket selections** ‐ If you selected specific S3 buckets for the job, review the table of buckets and change your bucket selections as necessary. The table provides insight into the projected scope and cost of the job's analysis. The data is based on the size and types of objects that are currently stored in a bucket.

  In the table, the **Estimated cost** field indicates the total estimated cost (in US dollars) of analyzing objects in an S3 bucket. Each estimate reflects the projected amount of uncompressed data that the job will analyze in a bucket. If any objects are compressed or archive files, the estimate assumes that the files use a 3:1 compression ratio and the job can analyze all extracted files. For more information, see [Forecasting and monitoring job costs](discovery-jobs-costs.md).
+ **Review your bucket criteria** ‐ If you specified bucket criteria for the job, review each condition in the criteria. To change the criteria, choose **Previous**, and then use the filter options in the preceding step to enter the correct criteria. When you finish, choose **Next**.

When you finish reviewing and verifying the settings, choose **Next**.

## Step 3: Define the schedule and refine the scope


For this step, specify how often you want the job to run—once, or periodically on a daily, weekly, or monthly basis. Also choose various options to refine the scope of the job's analysis. To learn about these options, see [Scope options for jobs](discovery-jobs-scope.md).

**To define the schedule and refine the scope of the job**

1. On the **Refine the scope** page, specify how often you want the job to run: 
   + To run the job only once, immediately after you finish creating it, choose **One-time job**.
   + To run the job periodically on a recurring basis, choose **Scheduled job**. For **Update frequency**, choose whether to run the job daily, weekly, or monthly. Then use the **Include existing objects** option to define the scope of the job's first run:
     + Select this checkbox to analyze all existing objects immediately after you finish creating the job. Each subsequent run analyzes only those objects that are created or changed after the preceding run.
     + Clear this checkbox to skip analysis of all existing objects. The job's first run analyzes only those objects that are created or changed after you finish creating the job and before the first run starts. Each subsequent run analyzes only those objects that are created or changed after the preceding run.

       Clearing this checkbox is helpful for cases where you already analyzed the data and want to continue to analyze it periodically. For example, if you previously used another service or application to classify data and you recently started using Macie, you might use this option to ensure continued discovery and classification of your data without incurring unnecessary costs or duplicating classification data.

1. (Optional) To specify the percentage of objects that you want the job to analyze, enter the percentage in the **Sampling depth** box.

   If this value is less than 100%, Macie selects the objects to analyze at random, up to the specified percentage, and analyzes all the data in those objects. The default value is 100%.

1. (Optional) To add specific criteria that determine which S3 objects are included or excluded from the job's analysis, expand the **Additional settings** section, and then enter the criteria. These criteria consist of individual conditions that derive from properties of objects:
   + To analyze (*include*) objects that meet a specific condition, enter the condition type and value, and then choose **Include**.
   + To skip (*exclude*) objects that meet a specific condition, enter the condition type and value, and then choose **Exclude**.

   Repeat this step for each include or exclude condition that you want.

   If you enter multiple conditions, any exclude conditions take precedence over include conditions. For example, if you include objects that have the .pdf file name extension and exclude objects that are larger than 5 MB, the job analyzes any object that has the .pdf file name extension, unless the object is larger than 5 MB.

1. When you finish, choose **Next**.

## Step 4: Select managed data identifiers


For this step, specify which managed data identifiers you want the job to use when it analyzes S3 objects. You have two options:
+ **Use recommended settings** ‐ With this option, the job analyzes S3 objects by using the set of managed data identifiers that we recommend for jobs. This set is designed to detect common categories and types of sensitive data. To review a list of managed data identifiers that are currently in the set, see [Managed data identifiers recommended for jobs](discovery-jobs-mdis-recommended.md). We update that list each time we add or remove a managed data identifier from the set.
+ **Use custom settings** ‐ With this option, the job analyzes S3 objects by using managed data identifiers that you select. This can be all or only some of the managed data identifiers that are currently available. You can also configure the job to not use any managed data identifiers. The job can instead use only custom data identifiers that you select in the next step. To review a list of managed data identifiers that are currently available, see [Quick reference: Managed data identifiers by type](mdis-reference-quick.md). We update that list each time we release a new managed data identifier.

When you choose either option, Macie displays a table of managed data identifiers. In the table, the **Sensitive data type** field specifies the unique identifier (ID) for a managed data identifier. This ID describes the type of sensitive data that the managed data identifier is designed to detect, for example: **USA\$1PASSPORT\$1NUMBER** for US passport numbers, **CREDIT\$1CARD\$1NUMBER** for credit card numbers, and **PGP\$1PRIVATE\$1KEY** for PGP private keys. To find specific identifiers more quickly, you can sort and filter the table by sensitive data category or type.

**To select managed data identifiers for the job**

1. On the **Select managed data identifiers** page, under **Managed data identifier options**, do one of the following:
   + To use the set of managed data identifiers that we recommend for jobs, choose **Recommended**.

     If you choose this option and you configured the job to run more than once, each run automatically uses all the managed data identifiers that are in the recommended set when the run starts. This includes new managed data identifiers that we release and add to the set. It excludes managed data identifiers that we remove from the set and no longer recommend for jobs.
   + To use only specific managed data identifiers that you select, choose **Custom**, and then choose **Use specific managed data identifiers**. Then, in the table, select the checkbox for each managed data identifier that you want the job to use.

     If you choose this option and you configured the job to run more than once, each run uses only the managed data identifiers that you select. In other words, the job uses these same managed data identifiers each time it runs.
   + To use all the managed data identifiers that Macie currently provides, choose **Custom**, and then choose **Use specific managed data identifiers**. Then, in the table, select the checkbox in the selection column heading to select all rows.

     If you choose this option and you configured the job to run more than once, each run uses only the managed data identifiers that you select. In other words, the job uses these same managed data identifiers each time it runs.
   + To not use any managed data identifiers and use only custom data identifiers, choose **Custom**, and then choose **Don't use any managed data identifiers**. Then, in the next step, select the custom data identifiers to use.

1. When you finish, choose **Next**.

## Step 5: Select custom data identifiers


For this step, select any custom data identifiers that you want the job to use when it analyzes S3 objects. The job will use the selected identifiers in addition to any managed data identifiers that you configured the job to use. To learn more about custom data identifiers, see [Building custom data identifiers](custom-data-identifiers.md).

**To select custom data identifiers for the job**

1. On the **Select custom data identifiers** page, select the checkbox for each custom data identifier that you want the job to use. You can select as many as 30 custom data identifiers.
**Tip**  
To review or test the settings for a custom data identifier before you select it, choose the link icon (![\[The link icon, which is a blue box that has an arrow in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-external-link.png)) next to the identifier's name. Macie opens a page that displays the identifier's settings.  
You can also use this page to test the identifier with sample data. To do this, enter up to 1,000 characters of text in the **Sample data** box, and then choose **Test**. Macie evaluates the sample data by using the identifier, and then reports the number of matches.

1. When you finish selecting custom data identifiers, choose **Next**.

## Step 6: Select allow lists


For this step, select any allow lists that you want the job to use when it analyzes S3 objects. To learn more about allow lists, see [Defining sensitive data exceptions with allow lists](allow-lists.md).

**To select allow lists for the job**

1. On the **Select allow lists** page, select the checkbox for each allow list that you want the job to use. You can select as many as 10 lists.
**Tip**  
To review the settings for an allow list before you select it, choose the link icon (![\[The link icon, which is a blue box that has an arrow in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-external-link.png)) next to the list's name. Macie opens a page that displays the list's settings.  
If the list specifies a regular expression (*regex*), you can also use this page to test the regex with sample data. To do this, enter up to 1,000 characters of text in the **Sample data** box, and then choose **Test**. Macie evaluates the sample data by using the regex, and then reports the number of matches.

1. When you finish selecting allow lists, choose **Next**.

## Step 7: Enter general settings


For this step, specify a name and, optionally, a description of the job. You can also assign tags to the job. A *tag* is a label that you define and assign to certain types of AWS resources. Each tag consists of a required tag key and an optional tag value. Tags can help you identify, categorize, and manage resources in different ways, such as by purpose, owner, environment, or other criteria. To learn more, see [Tagging Macie resources](tagging-resources.md).

**To enter general settings for the job**

1. On the **Enter general settings** page, enter a name for the job in the **Job name** box. The name can contain as many as 500 characters. 

1. (Optional) For **Job description**, enter a brief description of the job. The description can contain as many as 200 characters. 

1. (Optional) For **Tags**, choose **Add tag**, and then enter as many as 50 tags to assign to the job.

1. When you finish, choose **Next**.

## Step 8: Review and create


For this final step, review the job's configuration settings and verify that they're correct. This is an important step. After you create a job, you can’t change any of these settings. This helps ensure that you have an immutable history of sensitive data findings and discovery results for data privacy and protection audits or investigations that you perform.

Depending on the job's settings, you can also review the total estimated cost (in US dollars) of running the job once. If you selected specific S3 buckets for the job, the estimate is based on the size and types of objects in the buckets that you selected, and how much of that data the job can analyze. If you specified bucket criteria for the job, the estimate is based on the size and types of objects in as many as 500 buckets that currently match the criteria, and how much of that data the job can analyze. To learn about this estimate, see [Forecasting and monitoring job costs](discovery-jobs-costs.md).

**To review and create the job**

1. On the **Review and create** page, review each setting and verify that it's correct. To change a setting, choose **Edit** in the section that contains the setting, and then enter the correct setting. You can also use the navigation tabs to go to the page that contains a setting.

1. When you finish verifying the settings, choose **Submit** to create and save the job. Macie checks the settings and notifies you of any issues to address.
**Note**  
If you haven’t configured a repository for your sensitive data discovery results, Macie displays a warning and doesn't save the job. To address this issue, choose **Configure** in the **Repository for sensitive data discovery results** section. Then enter the configuration settings for the repository. To learn how, see [Storing and retaining sensitive data discovery results](discovery-results-repository-s3.md). After you enter the settings, return to the **Review and create** page and choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) in the **Repository for sensitive data discovery results** section of the page.  
Although we don't recommend it, you can temporarily override the repository requirement and save the job. If you do this, you risk losing discovery results from the job—Macie retains the results for only 90 days. To temporarily override the requirement, select the checkbox for the override option.

1. If Macie notifies you of issues to address, address the issues, and then choose **Submit** again to create and save the job.

If you configured the job to run once, on a daily basis, or on the current day of the week or month, Macie starts running the job immediately after you save it. Otherwise, Macie prepares to run the job on the specified day of the week or month. To monitor the job, you can [check the status of the job](discovery-jobs-status-check.md).

# Reviewing the results of a sensitive data discovery job
Reviewing job results

When you run a sensitive data discovery job, Amazon Macie automatically calculates and reports certain statistical data for the job. For example, Macie reports the number of times that the job has run, and the approximate number of Amazon Simple Storage Service (Amazon S3) objects that the job has yet to process during its current run. Macie also produces several types of results for the job: *log events*, *sensitive data findings*, and *sensitive data discovery results*.

**Topics**
+ [Types of job results](#discovery-jobs-manage-results-types)
+ [Reviewing job statistics and results](#discovery-jobs-manage-results-review)

## Types of results for sensitive data discovery jobs
Types of job results

As a sensitive data discovery job progresses, Amazon Macie produces the following types of results for the job.

**Log event**  
This is a record of an event that occurred while the job was running. Macie automatically logs and publishes data for certain events to Amazon CloudWatch Logs. The data in these logs provides a record of changes to the job's progress or status, such as the exact date and time when the job started or stopped running. The data also provides details about any account- or bucket-level errors that occurred while the job ran.  
Log events can help you monitor a job and address any issues that prevented the job from analyzing the data that you want. If a job uses runtime criteria to determine which S3 buckets to analyze, log events can also help you determine whether and which S3 buckets matched the criteria when the job ran.  
You can access log events by using the Amazon CloudWatch console or the Amazon CloudWatch Logs API. To help you navigate to the log events for a job, the Amazon Macie console provides a link to them. For more information, see [Monitoring jobs with CloudWatch Logs](discovery-jobs-monitor-cw-logs.md).

**Sensitive data finding**  
This is a report of sensitive data that Macie found in an S3 object. Each finding provides a severity rating and details such as:  
+ The date and time when Macie found the sensitive data.
+ The category and types of sensitive data that Macie found.
+ The number of occurrences of each type of sensitive data that Macie found.
+ The unique identifier for the job that produced the finding.
+ The name, public access settings, encryption type, and other information about the affected S3 bucket and object.
Depending on the affected S3 object's file type or storage format, the details can also include the location of as many as 15 occurrences of the sensitive data that Macie found. To report location data, sensitive data findings use a [standardized JSON schema](findings-locate-sd-schema.md).  
A sensitive data finding doesn't include the sensitive data that Macie found. Instead, it provides information that you can use for further investigation and remediation as necessary.  
Macie stores sensitive data findings for 90 days. You can access them by using the Amazon Macie console or the Amazon Macie API. You can also monitor and process them by using other applications, services, and systems. For more information, see [Reviewing and analyzing findings](findings.md).

**Sensitive data discovery result**  
This is a record that logs details about the analysis of an S3 object. Macie automatically creates a sensitive data discovery result for each object that you configure a job to analyze. This includes objects that Macie doesn't find sensitive data in, and therefore don't produce sensitive data findings, and objects that Macie can't analyze due to errors or issues such as permissions settings or use of an unsupported file or storage format.  
If Macie finds sensitive data in an S3 object, the sensitive data discovery result includes data from the corresponding sensitive data finding. It provides additional information too, such as the location of as many as 1,000 occurrences of each type of sensitive data that Macie found in the object. For example:   
+ The column and row number for a cell or field in a Microsoft Excel workbook, CSV file, or TSV file
+ The path to a field or array in a JSON or JSON Lines file
+ The line number for a line in a non-binary text file other than a CSV, JSON, JSON Lines, or TSV file—for example, an HTML, TXT, or XML file
+ The page number for a page in an Adobe Portable Document Format (PDF) file
+ The record index and the path to a field in a record in an Apache Avro object container or Apache Parquet file
If the affected S3 object is an archive file, such as a .tar or .zip file, the sensitive data discovery result also provides detailed location data for occurrences of sensitive data in individual files that Macie extracted from the archive. Macie doesn’t include this information in sensitive data findings for archive files. To report location data, sensitive data discovery results use a [standardized JSON schema](findings-locate-sd-schema.md).  
A sensitive data discovery result doesn't include the sensitive data that Macie found. Instead, it provides you with an analysis record that can be helpful for data privacy and protection audits or investigations.  
Macie stores your sensitive data discovery results for 90 days. You can’t access them directly on the Amazon Macie console or with the Amazon Macie API. Instead, you configure Macie to encrypt and store them in an S3 bucket. The bucket can serve as a definitive, long-term repository for all of your sensitive data discovery results. You can then optionally access and query the results in that repository. To learn how to configure these settings, see [Storing and retaining sensitive data discovery results](discovery-results-repository-s3.md).  
After you configure the settings, Macie writes your sensitive data discovery results to JSON Lines (.jsonl) files, and it encrypts and adds those files to the S3 bucket as GNU Zip (.gz) files. To help you navigate to the results, the Amazon Macie console provides links to them.

Sensitive data findings and sensitive data discovery results both adhere to standardized schemas. This can help you optionally query, monitor, and process them by using other applications, services, and systems.

**Tips**  
For a detailed, instructional example of how you might query and use sensitive data discovery results to analyze and report potential data security risks, see the following blog post on the *AWS Security Blog*: [How to query and visualize Macie sensitive data discovery results with Amazon Athena and Amazon Quick](https://aws.amazon.com/blogs/security/how-to-query-and-visualize-macie-sensitive-data-discovery-results-with-athena-and-quicksight/).  
For samples of Amazon Athena queries that you can use to analyze sensitive data discovery results, visit the [Amazon Macie Results Analytics repository](https://github.com/aws-samples/amazon-macie-results-analytics) on GitHub. This repository also provides instructions for configuring Athena to retrieve and decrypt your results, and scripts for creating tables for the results.

## Reviewing statistics and results for a sensitive data discovery job
Reviewing job statistics and results

To review processing statistics and the results of a sensitive data discovery job, you can use the Amazon Macie console or the Amazon Macie API. Follow these steps to review the statistics and results by using the console.

To access a job's processing statistics programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API. For programmatic access to the findings that a job produced, use the [ListFindings](https://docs.aws.amazon.com/macie/latest/APIReference/findings.html) operation and specify the job's unique identifier in a filter condition for the `classificationDetails.jobId` field. To learn how, see [Creating and applying filters to Macie findings](findings-filter-procedure.md). You can then use the [GetFindings](https://docs.aws.amazon.com/macie/latest/APIReference/findings-describe.html) operation to retrieve the details of the findings.

**To review statistics and results for a job**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**.

1. On the **Jobs** page, choose the name of the job whose statistics and results you want to review. The details panel displays statistics, settings, and other information about the job.

1. In the details panel, do any of the following:
   + To review processing statistics for the job, refer to the **Statistics** section of the panel. This section displays statistics such as the number of times that the job has run, and the approximate number of objects that the job has yet to process during its current run.
   + To review log events for the job, choose **Show results** at the top of the panel, and then choose **Show CloudWatch logs**. Macie opens the Amazon CloudWatch console and displays a table of the log events that Macie published for the job.
   + To review all the sensitive data findings that the job produced, choose **Show results** at the top of the panel, and then choose **Show findings**. Macie opens the **Findings** page and displays all the findings from the job. To review the details of a particular finding, choose the finding, and then refer to the details panel.
**Tip**  
In the finding details panel, you can use the link in the **Detailed result location** field to navigate to the corresponding sensitive data discovery result in Amazon S3:  
If the finding applies to a large archive or compressed file, the link displays the folder that contains the discovery results for the file. An archive or compressed file is *large* if it generates more than 100 discovery results.
If the finding applies to a small archive or compressed file, the link displays the file that contains the discovery results for the file. An archive or compressed file is *small* if it generates 100 or fewer discovery results.
If the finding applies to another type of file, the link displays the file that contains the discovery results for the file.
   + To review all the sensitive data discovery results that the job produced, choose **Show results** at the top of the panel, and then choose **Show classifications**. Macie opens the Amazon S3 console and displays the folder that contains all the discovery results for the job. This option is available only after you configure Macie to [store your sensitive data discovery results](discovery-results-repository-s3.md) in an S3 bucket.

# Managing sensitive data discovery jobs
Managing jobs

To help you manage your sensitive data discovery jobs, Amazon Macie maintains a complete inventory of your jobs in each AWS Region. With this inventory, you can manage your jobs as a single collection, and access configuration settings, processing statistics, and the status of individual jobs.

For example, you can identify all the jobs that you configured to run on a recurring basis for periodic analysis, assessment, and monitoring. You can also review a breakdown of the configuration settings for a job. This includes settings that define the scope of the analysis. It also includes settings that specify the types of sensitive data that you want Macie to detect and report when the job runs. If you use the Amazon Macie console to manage your jobs, each job's details also provide direct access to [sensitive data findings and other results](discovery-jobs-manage-results.md) that the job produced.

In addition to these tasks, you can create custom variations of individual jobs. You can copy an existing job, adjust the settings for the copy, and then save the copy as a new job. This can be helpful for cases where you want to analyze different sets of data in the same way, or the same set of data in different ways. It can also be helpful if you want to adjust the configuration settings for an existing job—cancel the existing job, copy it, and then adjust and save the copy as a new job.

**Topics**
+ [Reviewing your job inventory](discovery-jobs-manage-view.md)
+ [Reviewing configuration settings for a job](discovery-jobs-manage-settings.md)
+ [Checking the status of a job](discovery-jobs-status-check.md)
+ [Changing the status of a job](discovery-jobs-status-change.md)
+ [Copying a job](discovery-jobs-manage-copy.md)

# Reviewing your inventory of sensitive data discovery jobs
Reviewing your job inventory

On the Amazon Macie console, you can review a complete inventory of your sensitive data discovery jobs in the current AWS Region. The inventory provides both summary information for all of your jobs and details about individual jobs. Summary information includes: the current status of each job; whether a job runs on a scheduled, periodic basis; and, whether a job is configured to analyze objects in specific Amazon Simple Storage Service (Amazon S3) buckets or S3 buckets that match runtime criteria. For individual jobs, you can also access details such as a breakdown of the job's configuration settings. If a job has already run, the details also provide direct access to sensitive data findings and other types of results that the job produced.

**To review your job inventory**

Follow these steps to review your job inventory by using the Amazon Macie console. To access your inventory programmatically, use the [ListClassificationJobs](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-list.html) operation of the Amazon Macie API.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. At the top of the page, optionally choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the current status of each job.

1. In the **Jobs** table, review summary information for your jobs:
   + **Job name** – The name of the job.
   + **Resources** – Whether the job is configured to analyze objects in specific S3 buckets or buckets that match runtime criteria. If you explicitly selected buckets for the job to analyze, this field indicates the number of buckets that you selected. If you configured the job to use runtime criteria, the value for this field is **Criteria based**.
   + **Job type** – Whether the job is configured to run once (**One time**) or on a scheduled, periodic basis (**Scheduled**). 
   + **Status** – The current status of the job. To learn more about this value, see [Checking the status of a job](discovery-jobs-status-check.md).
   + **Created at** – When the job was created.

1. To analyze your inventory or find a specific job more quickly, do any of the following:
   + To sort the table by a specific field, choose the column heading for the field. To change the sort order, choose the column heading again.
   + To show only those jobs that have a specific value for a field, place your cursor in the filter box. In the menu that appears, choose the field to use for the filter, and enter the value for the filter. Then choose **Apply**.
   + To hide jobs that have a specific value for a field, place your cursor in the filter box. In the menu that appears, choose the field to use for the filter, and enter the value for the filter. Then choose **Apply**. In the filter box, choose the equals icon (![\[The equals icon, which is a solid gray circle.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-operator-equals.png)) for the filter. This changes the filter's operator from *equals* to *not equals* (![\[The not equals icon, which is an empty gray circle that has a backslash in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-operator-not-equals.png)).
   + To remove a filter, choose the remove filter icon (![\[The remove filter condition icon, which is a circle that has an X in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-filter-remove.png)) for the filter to remove.

1. To review additional settings and details for a particular job, choose the job's name. Then refer to the details panel. For information about these details, see [Reviewing configuration settings for a job](discovery-jobs-manage-settings.md).

# Reviewing the settings for a sensitive data discovery job
Reviewing configuration settings for a job

On the Amazon Macie console, you can use the details panel on the **Jobs** page to review configuration settings and other information about individual sensitive data discovery jobs. For example, you can review a list of the Amazon Simple Storage Service (Amazon S3) buckets that a job is configured to analyze. You can also determine which managed and custom data identifiers a job is configured to use when analyzing objects in those buckets.

Note that you can’t change any configuration settings for an existing job. This helps ensure that you have an immutable history of sensitive data findings and discovery results for data privacy and protection audits or investigations that you perform.

If you want to change an existing job, you can [cancel the job](discovery-jobs-status-change.md). Then [copy the job](discovery-jobs-manage-copy.md), configure the copy to use the settings that you want, and save the copy as a new job. If you do this, you should also take steps to ensure that the new job doesn't analyze existing data in the same way again. To do this, note the date and time when you cancel the existing job. Then configure the scope of the new job to include only those objects that are created or changed after you cancel the original job. For example, you can use [object criteria](discovery-jobs-scope.md#discovery-jobs-scope-criteria) to define an exclude condition that specifies when you cancelled the original job.

**To review the configuration settings for a job**

Follow these steps to review a job's configuration settings by using the Amazon Macie console. To review the settings programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. In the **Jobs** table, choose the name of the job whose settings you want to review. To find the job more quickly, you can filter the table by using the filter options above the table. You can also sort the table in ascending or descending order by certain fields.

When you choose a job in the table, the details panel displays the job's configuration settings and other information about the job. Depending on the job's settings, the panel contains the following sections.

**General information**  
This section provides general information about the job. For example, it shows the Amazon Resource Name (ARN) of the job, when the job most recently started to run, and the current status of the job. If you paused the job, this section also indicates when you paused the job, and when the job or latest job run expired or will expire if you don't resume it.

**Statistics**  
This section shows processing statistics for the job. For example, it specifies the number of times that the job has run, and the approximate number of S3 objects that the job has yet to process during its current run.

**Scope**  
This section indicates how often the job runs. It also shows settings that refine the job's scope—for example, the [sampling depth](discovery-jobs-scope.md#discovery-jobs-scope-sampling), and any [object criteria](discovery-jobs-scope.md#discovery-jobs-scope-criteria) that include or exclude S3 objects from the analysis.

**S3 buckets**  
This section appears in the panel if the job is configured to analyze buckets that you explicitly selected when you created the job. It indicates the number of AWS accounts that the job is configured to analyze data for. It also indicates the number of buckets that the job is configured to analyze and the names of those buckets (grouped by account).  
To show the complete list of accounts and buckets in JSON format, choose the number in the **Total buckets** field.

**S3 bucket criteria**  
This section appears in the panel if the job uses runtime criteria to determine which buckets to analyze. It lists the criteria that the job is configured to use. To show the criteria in JSON format, choose **Details**. Then choose the **Criteria** tab in the window that appears.  
To review a list of buckets that currently match the criteria, choose **Details**. Then choose the **Matching buckets** tab in the window that appears. Optionally choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the latest data. The tab lists up to 25 buckets that currently match the criteria.  
If the job has already run, you can also determine whether any buckets matched the criteria when the job ran and, if so, the names of those buckets. To do this, review log events for the job: choose **Show results** at the top of the panel, and then choose **Show CloudWatch logs**. Macie opens the Amazon CloudWatch console and displays a table of log events for the job. The events include a `BUCKET_MATCHED_THE_CRITERIA` event for each bucket that matched the criteria and was included in the job's analysis. For more information, see [Monitoring jobs with CloudWatch Logs](discovery-jobs-monitor-cw-logs.md).

**Custom data identifiers**  
This section appears in the panel if the job is configured to use one or more [custom data identifiers](custom-data-identifiers.md). It specifies the names of those custom data identifiers.

**Allow lists**  
This section appears in the panel if the job is configured to use one or more [allow lists](allow-lists.md). It specifies the names of those lists. To review the settings and status of a list, choose the link icon (![\[The link icon, which is a blue box that has an arrow in it.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-view-resource-blue.png)) next to the list's name.

**Managed data identifiers**  
This section indicates which [managed data identifiers](managed-data-identifiers.md) the job is configured to use. This is determined by the managed data identifier selection type for the job:  
+ **Recommended** – Use the managed data identifiers that are in the [recommended set](discovery-jobs-mdis-recommended.md) when the job runs.
+ **Include selected** – Use only the managed data identifiers listed in the **Selections** section.
+ **Include all** – Use all the managed data identifiers that are available when the job runs.
+ **Exclude selected** – Use all the managed data identifiers that are available when the job runs, except the ones listed in the **Selections** section.
+ **Exclude all** – Don't use any managed data identifiers. Use only the specified custom data identifiers.
To review these settings in JSON format, choose **Details**.

**Tags**  
This section appears in the panel if tags are assigned to the job. It lists those tags. A *tag* is a label that you define and assign to certain types of AWS resources. Each tag consists of a required tag key and an optional tag value. To learn more, see [Tagging Macie resources](tagging-resources.md).

To review and save the job's settings in JSON format, choose the unique identifier for the job (**Job ID**) at the top of the panel. Then choose **Download**.

# Checking the status of a sensitive data discovery job
Checking the status of a job

When you create a sensitive data discovery job, its initial status is **Active (Running)** or **Active (Idle)**, depending on the job's type and schedule. The job then passes through additional states, which you can monitor as the job progresses.

**Tip**  
In addition to monitoring the overall status of a job, you can monitor specific events that occur as a job progresses. You can do this by using logging data that Amazon Macie automatically publishes to Amazon CloudWatch Logs. The data in these logs provides a record of changes to a job's status and details about any account- or bucket-level errors that occur while a job runs. For more information, see [Monitoring jobs with CloudWatch Logs](discovery-jobs-monitor-cw-logs.md).

**To check the status of a job**

Follow these steps to check the status of a job by using the Amazon Macie console. To check a job's status programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. At the top of the page, choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the current status of each job.

1. In the **Jobs** table, locate the job whose status you want to check. To find the job more quickly, you can filter the table by using the filter options above the table. You can also sort the table in ascending or descending order by certain fields.

1. Refer to the **Status** field in the table. This field indicates the job's current status.

A job's status can be one of the following.

**Active (Idle)**  
For a periodic job, the previous run is complete and the next scheduled run is pending. This value doesn't apply to one-time jobs.

**Active (Running)**  
For a one-time job, the job is currently in progress. For a periodic job, a scheduled run is in progress.

**Cancelled**  
For any type of job, the job was stopped permanently (cancelled).  
A job has this status if you explicitly cancelled it or, if it's a one-time job, you paused the job and didn't resume it within 30 days. A job can also have this status if you previously [suspended Macie](suspend-macie.md) in the current AWS Region.

**Complete**  
For a one-time job, the job ran successfully and is now complete. This value doesn't apply to periodic jobs. Instead, the status of a periodic job changes to **Active (Idle)** when each run completes successfully.

**Paused (By Macie)**  
For any type of job, the job was stopped temporarily (paused) by Macie.  
A job has this status if completion of the job or a job run would exceed the monthly [sensitive data discovery quota](macie-quotas.md) for your account. When this happens, Macie automatically pauses the job. Macie automatically resumes the job when the next calendar month starts and the monthly quota is reset for your account, or you increase the quota for your account.  
If you’re the Macie administrator for an organization and you configured the job to analyze data for member accounts, the job can also have this status if completion of the job or a job run would exceed the monthly sensitive data discovery quota for a member account.  
If a job is running and the analysis of eligible objects reaches this quota for a member account, the job stops analyzing objects that are owned by the account. When the job finishes analyzing objects for all other accounts that haven’t met the quota, Macie automatically pauses the job. If it’s a one-time job, Macie automatically resumes the job when the next calendar month starts or the quota is increased for all the affected accounts, whichever occurs first. If it’s a periodic job, Macie automatically resumes the job when the next run is scheduled to start or the next calendar month starts, whichever occurs first. If a scheduled run starts before the next calendar month starts or the quota is increased for an affected account, the job doesn’t analyze objects that are owned by the account.

**Paused (By user)**  
For any type of job, the job was stopped temporarily (paused) by you.  
If you pause a one-time job and you don't resume it within 30 days, the job expires and Macie cancels it. If you pause a periodic job while it's actively running and you don't resume it within 30 days, the job's run expires and Macie cancels the run. To check the expiration date for a paused job or job run, choose the job's name in the table, and then refer to the **Expires** field in the **Status details** section of the details panel.

If a job is cancelled or paused, you can refer to the job's details to determine whether the job started to run or, for a periodic job, ran at least once before it was cancelled or paused. To do this, choose the job's name in the **Jobs** table, and then refer to the details panel. In the panel, the **Number of runs** field indicates the number of times that the job has run. The **Last run time** field indicates the most recent date and time when the job started to run.

Depending on the job’s current status, you can optionally pause, resume, or cancel the job. For more information, see [Changing the status of a job](discovery-jobs-status-change.md).

# Changing the status of a sensitive data discovery job
Changing the status of a job

After you create a sensitive data discovery job, you can pause it temporarily or cancel it permanently. When you pause a job that's actively running, Amazon Macie immediately begins to pause all processing tasks for the job. When you cancel a job that's actively running, Macie immediately begins to stop all processing tasks for the job. You can’t resume or restart a job after it’s cancelled.

If you pause a one-time job, you can resume it within 30 days. When you resume the job, Macie immediately resumes processing from the point where you paused the job. Macie doesn't restart the job from the beginning. If you don't resume a one-time job within 30 days of pausing it, the job expires and Macie cancels it.

If you pause a periodic job, you can resume it at any time. If you resume a periodic job and the job was idle when you paused it, Macie resumes the job according to the schedule and other configuration settings that you chose when you created the job. If you resume a periodic job and the job was actively running when you paused it, how Macie resumes the job depends on when you resume the job:
+ If you resume the job within 30 days of pausing it, Macie immediately resumes the latest scheduled run from the point where you paused the job. Macie doesn't restart the run from the beginning.
+ If you don't resume the job within 30 days of pausing it, the latest scheduled run expires and Macie cancels all remaining processing tasks for the run. When you subsequently resume the job, Macie resumes the job according to the schedule and other configuration settings that you chose when you created the job.

To help you determine when a paused job or job run will expire, Macie adds an expiration date to the job’s details while the job is paused. In addition, we notify you approximately seven days before the job or job run will expire. We notify you again when the job or job run expires and is cancelled. To notify you, we send email to the address that's associated with your AWS account. We also create AWS Health events and Amazon CloudWatch Events for your account. To check the expiration date by using the console, choose the job’s name in the table on the **Jobs** page. Then refer to the **Expires** field in the **Status details** section of the details panel. To check the date programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API. 

**To pause, resume, or cancel a job**

To pause, resume, or cancel a job by using the Amazon Macie console, follow these steps. To do this programmatically, use the [UpdateClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. At the top of the page, choose refresh (![\[The refresh button, which is a button that displays an empty blue circle with an arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/btn-refresh-data.png)) to retrieve the current status of each job.

1. In the **Jobs** table, select the checkbox for the job that you want to pause, resume, or cancel. To find the job more quickly, you can filter the table by using the filter options above the table. You can also sort the table in ascending or descending order by certain fields.

1. On the **Actions** menu, do one of the following:
   + To pause the job temporarily, choose **Pause**. This option is available only if the job's current status is **Active (Idle)**, **Active (Running)**, or **Paused (By Macie)**.
   + To resume the job, choose **Resume**. This option is available only if the job's current status is **Paused (By user)**.
   + To cancel the job permanently, choose **Cancel**. If you choose this option, you can't subsequently resume or restart the job.

# Copying a sensitive data discovery job
Copying a job

To quickly create a sensitive data discovery job that's similar to an existing job, you can create a copy of the existing job. You can then edit the copy's settings, and save the copy as a new job. This can be helpful for cases where you want to analyze different sets of data in the same way, or the same set of data in different ways. It can also be helpful if you want to adjust the configuration settings for an existing job—cancel the existing job, copy it, and then adjust and save the copy as a new job.

**To copy a job**

Follow these steps to copy a job by using the Amazon Macie console. To copy a job programmatically, use the [DescribeClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs-jobid.html) operation of the Amazon Macie API to retrieve the configuration settings for the job that you want to copy. Then use the [CreateClassificationJob](https://docs.aws.amazon.com/macie/latest/APIReference/jobs.html) operation to create a copy of the job.

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. In the navigation pane, choose **Jobs**. The **Jobs** page opens and displays the number of jobs in your inventory and a table of those jobs.

1. In the **Jobs** table, select the checkbox for the job that you want to copy. To find the job more quickly, you can filter the table by using the filter options above the table. You can also sort the table in ascending or descending order by certain fields.

1. On the **Actions** menu, choose **Copy to new**.

1. Complete the steps on the console to review and adjust the settings for the copy of the job. For the **Refine the scope** step, consider choosing options that prevent the job from analyzing existing data in the same way again: 
   + For a one-time job, use [object criteria](discovery-jobs-scope.md#discovery-jobs-scope-criteria) to include only those objects that were created or changed after a certain time. For example, if you're creating a copy of a job that you cancelled, add a **Last modified** condition that specifies the date and time when you cancelled the existing job.
   + For a periodic job, clear the **Include existing objects** checkbox. If you do this, the first run of the job analyzes only those objects that are created or changed after you create the job and before the job's first run. You can also use [object criteria](discovery-jobs-scope.md#discovery-jobs-scope-criteria) to exclude objects that were last modified before a certain date and time.

   For additional details about this and other steps, see [Creating a sensitive data discovery job](discovery-jobs-create.md).

1. When you finish, choose **Submit** to save the copy as a new job.

If you configured the job to run once, on a daily basis, or on the current day of the week or month, Macie starts running the job immediately after you save it. Otherwise, Macie prepares to run the job on the specified day of the week or month. To monitor the job, you can [check the status of the job](discovery-jobs-status-check.md).

# Monitoring sensitive data discovery jobs with CloudWatch Logs
Monitoring jobs with CloudWatch Logs

In addition to [monitoring the overall status](discovery-jobs-status-check.md) of a sensitive data discovery job, you can monitor and analyze specific events that occur as a job progresses. You can do this by using near real-time logging data that Amazon Macie automatically publishes to Amazon CloudWatch Logs. The data in these logs provides a record of changes to a job's progress or status. For example, you can use the data to determine the exact date and time when a job started to run, was paused, or finished running.

The log data also provides details about any account- or bucket-level errors that occur while a job runs. For example, Macie logs an event if the permissions settings for an Amazon Simple Storage Service (Amazon S3) bucket prevent a job from analyzing objects in the bucket. The event indicates when the error occurred, and it identifies the affected bucket and the AWS account that owns the bucket. The data for these types of events can help you identify, investigate, and address errors that prevent Macie from analyzing the data that you want.

With Amazon CloudWatch Logs, you can monitor, store, and access log files from multiple systems, applications, and AWS services, including Macie. You can also query and analyze log data, and configure CloudWatch Logs to notify you when certain events occur or thresholds are met. CloudWatch Logs also provides features for archiving log data and exporting the data to Amazon S3. To learn more about CloudWatch Logs, see the [Amazon CloudWatch Logs User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html).

**Topics**
+ [How logging works for jobs](discovery-jobs-monitor-cw-logs-configure.md)
+ [Reviewing logs for jobs](discovery-jobs-monitor-cw-logs-review.md)
+ [Understanding log events for jobs](discovery-jobs-monitor-cw-logs-ref.md)

# How logging works for sensitive data discovery jobs
How logging works for jobs

When you start running sensitive data discovery jobs, Amazon Macie automatically creates and configures the appropriate resources in Amazon CloudWatch Logs to log events for all of your jobs. Macie then publishes event data to those resources automatically when your jobs run. The permissions policy for the Macie [service-linked role](service-linked-roles.md) for your account allows Macie to perform these tasks on your behalf. You don't need to take any steps to create or configure resources in CloudWatch Logs to log event data for your jobs.

In CloudWatch Logs, logs are organized into *log groups*. Each log group contains *log streams*. Each log stream contains *log events*. The general purpose of each of these resources is as follows:
+ A *log group* is a collection of log streams that share the same retention, monitoring, and access control settings—for example, the collection of logs for all of your sensitive data discovery jobs.
+ A *log stream* is a sequence of log events that share the same source—for example, an individual sensitive data discovery job.
+ A *log event* is a record of an activity that was recorded by an application or resource—for example, an individual event that Macie recorded and published for a particular sensitive data discovery job.

Macie publishes events for all of your sensitive data discovery jobs to one log group. Each job has a unique log stream in that log group. The log group has the following prefix and name:

`/aws/macie/classificationjobs`

If this log group already exists, Macie uses it to store log events for your jobs. This can be helpful if your organization uses automated configuration, such as [AWS CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html), to create log groups with predefined retention periods, encryption settings, tags, metric filters, and so on, for job events.

If this log group doesn't exist, Macie creates it with the default settings that CloudWatch Logs uses for new log groups. The settings include a log retention period of **Never Expire**, which means that CloudWatch Logs stores the logs indefinitely. You can change the retention period for the log group. To learn how, see [Working with log groups and log streams](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/Working-with-log-groups-and-streams.html) in the *Amazon CloudWatch Logs User Guide*.

Within this log group, Macie creates a unique log stream for each job that you run, the first time that the job runs. The name of the log stream is the unique identifier for the job, such as `85a55dc0fa6ed0be5939d0408example`, in the following format:

`/aws/macie/classificationjobs/85a55dc0fa6ed0be5939d0408example`

Each log stream contains all the log events that Macie recorded and published for the corresponding job. For periodic jobs, this includes events for all of the job's runs. If you delete the log stream for a periodic job, Macie creates the stream again the next time that the job runs. If you delete the log stream for a one-time job, you can't restore it.

Note that logging is enabled by default for all of your jobs. You can't disable it or otherwise prevent Macie from publishing job events to CloudWatch Logs. If you don't want to store the logs, you can reduce the retention period for the log group to as little as one day. At the end of the retention period, CloudWatch Logs automatically deletes expired event data from the log group.



# Reviewing logs for sensitive data discovery jobs
Reviewing logs for jobs

After you start running sensitive data discovery jobs in Amazon Macie, you can review logs for your jobs by using Amazon CloudWatch Logs. CloudWatch Logs provides features that are designed to help you review, analyze, and monitor log data. You can use these features to work with log streams and events for jobs as you would work with any other type of log data in CloudWatch Logs.

For example, you can search and filter aggregate data to identify specific types of events that occurred for all of your jobs during a specific time range. Or you can perform a targeted review of all the events that occurred for a particular job. CloudWatch Logs also provides options for monitoring log data, defining metric filters, and creating custom alarms.

**Tip**  
To quickly navigate to the log data for a particular job, you can use the Amazon Macie console. To do this, choose the job's name on the **Jobs** page. At the top of the details panel, choose **Show results**, and then choose **Show CloudWatch logs**. Macie opens the Amazon CloudWatch console and displays a table of log events for the job.

**To review logs for sensitive data discovery jobs**

Follow these steps to navigate to and review log data by using the Amazon CloudWatch console. To review the data programmatically, use the [Amazon CloudWatch Logs API](https://docs.aws.amazon.com/AmazonCloudWatchLogs/latest/APIReference/Welcome.html).

1. Open the CloudWatch console at [https://console.aws.amazon.com/cloudwatch/](https://console.aws.amazon.com/cloudwatch/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you ran jobs that you want to review logs for.

1. In the navigation pane, choose **Logs**, and then choose **Log groups**.

1. On the **Log groups** page, choose the **/aws/macie/classificationjobs** log group. CloudWatch displays a table of log streams for the jobs that you've run. There is one unique stream for each job. The name of each stream correlates to the unique identifier for a job.

1. On the **Log streams** tab, do one of the following:
   + To review the log events for a particular job, choose the log stream for the job. To find the stream more easily, enter the job's unique identifier in the filter box above the table. After you choose the log stream, CloudWatch displays a table of log events for the job.
   + To review log events for all of your jobs, choose **Search all log streams**. CloudWatch displays a table of log events for all of your jobs.

1. (Optional) In the filter box above the table, enter terms, phrases, or values that specify characteristics of specific events to review. For more information, see [Search log data using filter patterns](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/SearchDataFilterPattern.html) in the *Amazon CloudWatch Logs User Guide*.

1. To review the details of a specific log event, choose expand (![\[The expand row icon, which is a right-facing solid arrow.\]](http://docs.aws.amazon.com/macie/latest/user/images/icon-caret-right-filled.png)) in the row for the event. CloudWatch displays the event's details in JSON format. To learn more about these details, see [Understanding log events for jobs](discovery-jobs-monitor-cw-logs-ref.md).

As you familiarize yourself with the data in the log events, you can perform additional tasks to streamline analysis and monitoring of the data. For example, you can [create metrics filters](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/MonitoringLogData.html) that turn log data into numerical CloudWatch metrics. You can also [create custom alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ConsoleAlarms.html) that make it easier to identify and respond to specific log events. For more information, see the [Amazon CloudWatch Logs User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html).

# Understanding log events for sensitive data discovery jobs
Understanding log events for jobs

To help you monitor your sensitive data discovery jobs, Amazon Macie automatically publishes logging data for jobs to Amazon CloudWatch Logs. The data in these logs provides a record of changes to a job's progress or status. For example, you can use the data to determine the exact date and time when a job started to run or finished running. The data also provides details about certain types of errors that can occur while a job runs. This data can help you identify, investigate, and address errors that prevent Macie from analyzing the data that you want.

When you start running jobs, Macie automatically creates and configures the appropriate resources in CloudWatch Logs to log events for all of your jobs. Macie then publishes event data to those resources automatically when your jobs run. For more information, see [How logging works for jobs](discovery-jobs-monitor-cw-logs-configure.md).

By using CloudWatch Logs, you can then query and analyze log data for your jobs. For example, you can search and filter aggregate data to identify specific types of events that occurred for all of your jobs during a specific time range. Or you can perform a targeted review of all the events that occurred for a particular job. CloudWatch Logs also provides options for monitoring log data, defining metric filters, and creating custom alarms. For example, you can configure CloudWatch Logs to notify you if a certain type of event occurs when your jobs run. For more information, see the [Amazon CloudWatch Logs User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html).

**Contents**
+ [Log event schema for jobs](#discovery-jobs-monitor-cw-logs-schema)
+ [Types of log events for jobs](#discovery-jobs-monitor-cw-logs-event-index)
  + [Job status events](#discovery-jobs-monitor-cw-logs-event-index-status)
  + [Account-level error events](#discovery-jobs-monitor-cw-logs-event-index-account-errors)
  + [Bucket-level error events](#discovery-jobs-monitor-cw-logs-event-index-bucket-errors)

## Log event schema for sensitive data discovery jobs
Log event schema for jobs

Each log event for a sensitive data discovery job is a JSON object that contains a standard set of fields and conforms to the Amazon CloudWatch Logs event schema. Some types of events have additional fields that provide information that's particularly useful for that type of event. For example, events for account-level errors include the account ID for the affected AWS account. Events for bucket-level errors include the name of the affected Amazon Simple Storage Service (Amazon S3) bucket.

The following example shows the log event schema for sensitive data discovery jobs. In this example, the event reports that Amazon Macie wasn't able to analyze any objects in an S3 bucket because Amazon S3 denied access to the bucket.

```
{
    "adminAccountId": "123456789012",
    "jobId": "85a55dc0fa6ed0be5939d0408example",
    "eventType": "BUCKET_ACCESS_DENIED",
    "occurredAt": "2024-04-14T17:11:30.574809Z",
    "description": "Macie doesn’t have permission to access the affected S3 bucket.",
    "jobName": "My_Macie_Job",
    "operation": "ListObjectsV2",
    "runDate": "2024-04-14T17:08:30.345809Z",
    "affectedAccount": "111122223333",
    "affectedResource": {
        "type": "S3_BUCKET_NAME",
        "value": "amzn-s3-demo-bucket"
    }
}
```

In the preceding example, Macie attempted to list the bucket's objects by using the [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) operation of the Amazon S3 API. When Macie sent the request to Amazon S3, Amazon S3 denied access to the bucket. 

The following fields are common to all log events for sensitive data discovery jobs:
+ `adminAccountId` – The unique identifier for the AWS account that created the job.
+ `jobId` – The unique identifier for the job.
+ `eventType` – The type of event that occurred.
+ `occurredAt` – The date and time, in Coordinated Universal Time (UTC) and extended ISO 8601 format, when the event occurred.
+ `description` – A brief description of the event.
+ `jobName` – The name of the job.

Depending on the type and nature of an event, a log event can also contain the following fields:
+ `affectedAccount` – The unique identifier for the AWS account that owns the affected resource.
+ `affectedResource` – A JSON object that provides details about the affected resource. In the object, the `type` field specifies a field that stores metadata about a resource. The `value` field specifies the value for the field (`type`).
+ `operation` – The operation that Macie attempted to perform and caused the error.
+ `runDate` – The date and time, in Coordinated Universal Time (UTC) and extended ISO 8601 format, when the applicable job or job run started.

## Types of log events for sensitive data discovery jobs
Types of log events for jobs

Amazon Macie publishes log events for three categories of events that can occur for a sensitive data discovery job:
+ Job status events, which record changes to the status or progress of a job or a job run.
+ Account-level error events, which record errors that prevented Macie from analyzing Amazon S3 data for a specific AWS account.
+ Bucket-level error events, which record errors that prevented Macie from analyzing data in a specific S3 bucket.

The topics in this section list and describe the types of events that Macie publishes for each category.

### Job status events
Job status events

A job status event records a change to the status or progress of a job or a job run. For periodic jobs, Macie logs and publishes these events for both the overall job and individual job runs.

The following example uses sample data to show the structure and nature of the fields in a job status event. In this example, a `SCHEDULED_RUN_COMPLETED` event indicates that a scheduled run of a periodic job finished running. The run started on April 14, 2024, at 17:09:30 UTC, as indicated by the `runDate` field. The run finished on April 14, 2024, at 17:16:30 UTC, as indicated by the `occurredAt` field.

```
{
    "adminAccountId": "123456789012",
    "jobId": "ffad0e71455f38a4c7c220f3cexample",
    "eventType": "SCHEDULED_RUN_COMPLETED",
    "occurredAt": "2024-04-14T17:16:30.574809Z",
    "description": "The scheduled job run finished running.",
    "jobName": "My_Daily_Macie_Job",
    "runDate": "2024-04-14T17:09:30.574809Z"
}
```

The following table lists and describes the types of job status events that Macie logs and publishes to CloudWatch Logs. The **Event type** column indicates the name of each event as it appears in the `eventType` field of an event. The **Description** column provides a brief description of the event as it appears in the `description` field of an event. The **Additional information** provides information about the type of job that the event applies to. The table is sorted first by the general chronological order in which events might occur, and then in ascending alphabetical order by event type.


| Event type | Description | Additional information | 
| --- | --- | --- | 
|  JOB\$1CREATED  |  The job was created.  |  Applies to one-time and periodic jobs.  | 
| ONE\$1TIME\$1JOB\$1STARTED |  The job started running.  |  Applies only to one-time jobs.  | 
|  SCHEDULED\$1RUN\$1STARTED  |  The scheduled job run started running.  |  Applies only to periodic jobs. To log the start of a one-time job, Macie publishes a ONE\$1TIME\$1JOB\$1STARTED event, not this type of event.  | 
|  BUCKET\$1MATCHED\$1THE\$1CRITERIA  |  The affected bucket matched the bucket criteria specified for the job.  |  Applies to one-time and periodic jobs that use runtime bucket criteria to determine which S3 buckets to analyze. The `affectedResource` object specifies the name of the bucket that matched the criteria and was included in the job's analysis.  | 
|  NO\$1BUCKETS\$1MATCHED\$1THE\$1CRITERIA  |  The job started running but no buckets currently match the bucket criteria specified for the job. The job didn't analyze any data.  |  Applies to one-time and periodic jobs that use runtime bucket criteria to determine which S3 buckets to analyze.  | 
| SCHEDULED\$1RUN\$1COMPLETED |  The scheduled job run finished running.  |  Applies only to periodic jobs. To log completion of a one-time job, Macie publishes a JOB\$1COMPLETED event, not this type of event.  | 
|  JOB\$1PAUSED\$1BY\$1USER  |  The job was paused by a user.  |  Applies to one-time and periodic jobs that you stopped temporarily (paused).  | 
|  JOB\$1RESUMED\$1BY\$1USER  |  The job was resumed by a user.  |  Applies to one-time and periodic jobs that you stopped temporarily (paused) and later resumed.  | 
|  JOB\$1PAUSED\$1BY\$1MACIE\$1SERVICE\$1QUOTA\$1MET  |  The job was paused by Macie. Completion of the job would exceed a monthly quota for the affected account.  |  Applies to one-time and periodic jobs that Macie stopped temporarily (paused). Macie automatically pauses a job when additional processing by the job or a job run would exceed the monthly [sensitive data discovery quota](macie-quotas.md) for one or more accounts that the job analyzes data for. To avoid this issue, consider increasing the quota for the affected accounts.  | 
|  JOB\$1RESUMED\$1BY\$1MACIE\$1SERVICE\$1QUOTA\$1LIFTED  |  The job was resumed by Macie. The monthly service quota was lifted for the affected account.  |  Applies to one-time and periodic jobs that Macie stopped temporarily (paused) and later resumed. If Macie automatically paused a one-time job, Macie automatically resumes the job when the subsequent month starts or the monthly sensitive data discovery quota is increased for all the affected accounts, whichever occurs first. If Macie automatically paused a periodic job, Macie automatically resumes the job when the next run is scheduled to start or the subsequent month starts, whichever occurs first.  | 
|  JOB\$1CANCELLED  | The job was cancelled. |  Applies to one-time and periodic jobs that you stopped permanently (cancelled) or, for one-time jobs, paused and didn't resume within 30 days. If you suspend or disable Macie, this type of event also applies to jobs that were active or paused when you suspended or disabled Macie. Macie automatically cancels your jobs in an AWS Region if you suspend or disable Macie in the Region.  | 
|  JOB\$1COMPLETED  |  The job finished running.  |  Applies only to one-time jobs. To log completion of a job run for a periodic job, Macie publishes a SCHEDULED\$1RUN\$1COMPLETED event, not this type of event.  | 

### Account-level error events
Account-level error events

An account-level error event records an error that prevented Macie from analyzing objects in S3 buckets that are owned by a specific AWS account. The `affectedAccount` field in each event specifies the account ID for that account.

The following example uses sample data to show the structure and nature of the fields in an account-level error event. In this example, an `ACCOUNT_ACCESS_DENIED` event indicates that Macie wasn't able to analyze objects in any S3 buckets that are owned by account `444455556666`.

```
{
    "adminAccountId": "123456789012",
    "jobId": "85a55dc0fa6ed0be5939d0408example",
    "eventType": "ACCOUNT_ACCESS_DENIED",
    "occurredAt": "2024-04-14T17:08:30.585709Z",
    "description": "Macie doesn’t have permission to access S3 bucket data for the affected account.",
    "jobName": "My_Macie_Job",
    "operation": "ListBuckets",
    "runDate": "2024-04-14T17:05:27.574809Z",
    "affectedAccount": "444455556666"
}
```

The following table lists and describes the types of account-level error events that Macie logs and publishes to CloudWatch Logs. The **Event type** column indicates the name of each event as it appears in the `eventType` field of an event. The **Description** column provides a brief description of the event as it appears in the `description` field of an event. The **Additional information** column provides any applicable tips for investigating or addressing the error that occurred. The table is sorted in ascending alphabetical order by event type.


| Event type | Description | Additional information | 
| --- | --- | --- | 
|  ACCOUNT\$1ACCESS\$1DENIED  |  Macie doesn’t have permission to access S3 bucket data for the affected account.  |  This typically occurs because the buckets that are owned by the account have restrictive bucket policies. For information about how to address this issue, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md). The value for the `operation` field in the event can help you determine which permissions settings prevented Macie from accessing S3 data for the account. This field indicates the Amazon S3 operation that Macie attempted to perform when the error occurred.  | 
| ACCOUNT\$1DISABLED |  The job skipped resources that are owned by the affected account. Macie was disabled for the account.  |  To address this issue, re-enable Macie for the account in the same AWS Region.  | 
| ACCOUNT\$1DISASSOCIATED |  The job skipped resources that are owned by the affected account. The account isn't associated with your Macie administrator account as a member account anymore.  |  This occurs if you, as a Macie administrator for an organization, configure a job to analyze data for a member account and the account is later removed from your organization. To address this issue, re-associate the affected account with your Macie administrator account as a member account. For more information, see [Managing multiple accounts](macie-accounts.md).  | 
|  ACCOUNT\$1ISOLATED  |  The job skipped resources that are owned by the affected account. The AWS account was isolated.  |  –  | 
|  ACCOUNT\$1REGION\$1DISABLED  |  The job skipped resources that are owned by the affected account. The AWS account isn't active in the current AWS Region.  |  –   | 
|  ACCOUNT\$1SUSPENDED  |  The job was cancelled or skipped resources that are owned by the affected account. Macie was suspended for the account.  |  If the specified account is your own account, Macie automatically cancelled the job when you suspended Macie in the same Region. To address the issue, re-enable Macie in the Region.  If the specified account is a member account, re-enable Macie for that account in the same Region.  | 
|  ACCOUNT\$1TERMINATED  |  The job skipped resources that are owned by the affected account. The AWS account was terminated.  |  –  | 

### Bucket-level error events
Bucket-level error events

A bucket-level error event records an error that prevented Macie from analyzing objects in a specific S3 bucket. The `affectedAccount` field in each event specifies the account ID for the AWS account that owns the bucket. The `affectedResource` object in each event specifies the name of the bucket.

The following example uses sample data to show the structure and nature of the fields in a bucket-level error event. In this example, a `BUCKET_ACCESS_DENIED` event indicates that Macie wasn't able to analyze any objects in the S3 bucket named `amzn-s3-demo-bucket`. When Macie attempted to list the bucket's objects by using the [ListObjectsV2](https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html) operation of the Amazon S3 API, Amazon S3 denied access to the bucket.

```
{
    "adminAccountId": "123456789012",
    "jobId": "85a55dc0fa6ed0be5939d0408example",
    "eventType": "BUCKET_ACCESS_DENIED",
    "occurredAt": "2024-04-14T17:11:30.574809Z",
    "description": "Macie doesn’t have permission to access the affected S3 bucket.",
    "jobName": "My_Macie_Job",
    "operation": "ListObjectsV2",
    "runDate": "2024-04-14T17:09:30.685209Z",
    "affectedAccount": "111122223333",
    "affectedResource": {
        "type": "S3_BUCKET_NAME",
        "value": "amzn-s3-demo-bucket"
    }
}
```

The following table lists and describes the types of bucket-level error events that Macie logs and publishes to CloudWatch Logs. The **Event type** column indicates the name of each event as it appears in the `eventType` field of an event. The **Description** column provides a brief description of the event as it appears in the `description` field of an event. The **Additional information** column provides any applicable tips for investigating or addressing the error that occurred. The table is sorted in ascending alphabetical order by event type.


| Event type | Description | Additional information | 
| --- | --- | --- | 
|  BUCKET\$1ACCESS\$1DENIED  |  Macie doesn’t have permission to access the affected S3 bucket.  |  This typically occurs because a bucket has a restrictive bucket policy. For information about how to address this issue, see [Allowing Macie to access S3 buckets and objects](monitoring-restrictive-s3-buckets.md). The value for the `operation` field in the event can help you determine which permissions settings prevented Macie from accessing the bucket. This field indicates the Amazon S3 operation that Macie attempted to perform when the error occurred.  | 
|  BUCKET\$1DETAILS\$1UNAVAILABLE  |  A temporary issue prevented Macie from retrieving details about the bucket and the bucket’s objects.  |  This occurs if a transient issue prevented Macie from retrieving the bucket and object metadata that it needs to analyze a bucket's objects. For example, an Amazon S3 exception occurred when Macie tried to verify that it's allowed to access the bucket. To address the issue for a one-time job, consider creating and running a new, one-time job to analyze objects in the bucket. For a scheduled job, Macie will try to retrieve the metadata again during the next job run.  | 
| BUCKET\$1DOES\$1NOT\$1EXIST |  The affected S3 bucket doesn’t exist anymore.  |  This typically occurs because a bucket was deleted.   | 
|  BUCKET\$1IN\$1DIFFERENT\$1REGION  |  The affected S3 bucket was moved to a different AWS Region.  |  –  | 
| BUCKET\$1OWNER\$1CHANGED |  The owner of the affected S3 bucket changed. Macie doesn’t have permission to access the bucket anymore.  |  This typically occurs if ownership of a bucket was transferred to an AWS account that isn't part of your organization. The `affectedAccount` field in the event indicates the account ID for the account that previously owned the bucket.  | 

# Forecasting and monitoring costs for sensitive data discovery jobs
Forecasting and monitoring job costs

Amazon Macie pricing is based partly on the amount of data that you analyze by running sensitive data discovery jobs. To forecast and monitor your estimated costs for running sensitive data discovery jobs, you can review cost estimates that Macie provides when you create a job and after you start running jobs. 

To review and monitor your actual costs, you can use AWS Billing and Cost Management. AWS Billing and Cost Management provides features that are designed to help you track and analyze your costs for AWS services, and manage budgets for your account or organization. It also provides features that can help you forecast usage costs based on historical data. To learn more, see the [AWS Billing User Guide](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-what-is.html).

For information about Macie pricing, see [Amazon Macie pricing](https://aws.amazon.com/macie/pricing/).

**Topics**
+ [Forecasting the cost of a job](#discovery-jobs-costs-forecast)
+ [Monitoring estimated costs for jobs](#discovery-jobs-costs-track)

## Forecasting the cost of a sensitive data discovery job
Forecasting the cost of a job

When you create a sensitive data discovery job, Amazon Macie can calculate and display estimated costs during two key steps in the job creation process: when you review the table of S3 buckets that you selected for the job (step 2) and when you review all the settings for the job (step 8). These estimates can help you determine whether to adjust the job's settings before you save the job. The availability and nature of the estimates depends on the settings that you choose for the job.

**Reviewing estimated costs for individual buckets (step 2)**  
If you explicitly select individual buckets for a job to analyze, you can review the estimated cost of analyzing objects in each of those buckets. Macie displays these estimates during step 2 of the job creation process, when you review your bucket selections. In the table for this step, the **Estimated cost** field indicates the total estimated cost (in US dollars) of running the job once to analyze objects in a bucket.  
Each estimate reflects the projected amount of uncompressed data that the job will analyze in a bucket, based on the size and types of objects that are currently stored in the bucket. The estimate also reflects Macie pricing for the current AWS Region.  
Only classifiable objects are included in the cost estimate for a bucket. A *classifiable object* is an S3 object that uses a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes) and has a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats). If any classifiable objects are compressed or archive files, the estimate assumes that the files use a 3:1 compression ratio and the job can analyze all extracted files.

**Reviewing the total estimated cost of a job (step 8)**  
If you create a one-time job or you create and configure a periodic job to include existing S3 objects, Macie calculates and displays the job's total estimated cost during the final step of the job creation process. You can review this estimate while you review and verify all the settings that you selected for the job.  
This estimate indicates the total projected cost (in US dollars) of running the job once in the current Region. The estimate reflects the projected amount of uncompressed data that the job will analyze. It's based on the size and types of objects that are currently stored in buckets that you explicitly selected for the job or up to 500 buckets that currently match bucket criteria that you specified for the job, depending on the job's settings.  
Note that this estimate doesn't reflect any options that you selected to refine and reduce the scope of the job—for example, a lower sampling depth, or criteria that exclude certain S3 objects from the job. It also doesn't reflect your monthly [sensitive data discovery quota](macie-quotas.md), which might limit the scope and cost of the job's analysis, or any discounts that might apply to your account.  
In addition to the total estimated cost of the job, the estimate provides aggregated data that offers insight into the projected scope and cost of the job:  
+ **Size** values indicate the total storage size of the objects that the job can and can't analyze.
+ **Object count** values indicate the total number of objects that the job can and can't analyze.
In these values, a **Classifiable** object is an S3 object that uses a [supported Amazon S3 storage class](discovery-supported-storage.md#discovery-supported-s3-classes) and has a file name extension for a [supported file or storage format](discovery-supported-storage.md#discovery-supported-formats). Only classifiable objects are included in the cost estimate. A **Not classifiable** object is an object that doesn't use a supported storage class or doesn't have a file name extension for a supported file or storage format. These objects aren't included in the cost estimate.   
The estimate provides additional aggregated data for S3 objects that are compressed or archive files. The **Compressed** value indicates the total storage size of objects that use a supported Amazon S3 storage class and have a file name extension for a supported type of compressed or archive file. The **Uncompressed** value indicates the approximate size of these objects if they're decompressed, based on a specified compression ratio. This data is relevant due to the way that Macie analyzes compressed files and archive files.  
When Macie analyzes a compressed or archive file, it inspects both the full file and the contents of the file. To inspect the file’s contents, Macie decompresses the file, and then inspects each extracted file that uses a supported format. The actual amount of data that a job analyzes therefore depends on:  
+ Whether a file uses compression and, if so, the compression ratio that it uses.
+ The number, size, and format of the extracted files.
By default, Macie assumes the following when it calculates cost estimates for a job:   
+ All compressed and archive files use a 3:1 compression ratio.
+ All the extracted files use a supported file or storage format.
These assumptions can result in a larger size estimate for the scope of the data that the job will analyze, and, consequently, a higher cost estimate for the job.   
You can recalculate the job's total estimated cost based on a different compression ratio. To do this, choose the ratio from the **Choose an estimated compression ratio** list in the **Estimated cost** section. Macie then updates the estimate to match your selection.

For more information about how Macie calculates estimated costs, see [Understanding estimated usage costs](account-mgmt-costs-calculations.md).

## Monitoring estimated costs for sensitive data discovery jobs
Monitoring estimated costs for jobs

If you’re already running sensitive data discovery jobs, the **Usage** page on the Amazon Macie console can help you monitor the estimated cost of those jobs. The page shows your estimated costs (in US dollars) for using Macie in the current AWS Region during the current calendar month. For information about how Macie calculates these estimates, see [Understanding estimated usage costs](account-mgmt-costs-calculations.md).

**To review your estimated costs for running jobs**

1. Open the Amazon Macie console at [https://console.aws.amazon.com/macie/](https://console.aws.amazon.com/macie/).

1. By using the AWS Region selector in the upper-right corner of the page, choose the Region in which you want to review your estimated costs.

1. In the navigation pane, choose **Usage**.

1. On the **Usage** page, refer to the breakdown of estimated costs for your account. The **Sensitive data discovery jobs** item reports the total estimated cost of the jobs that you've run thus far during the current month in the current Region.

   If you're the Macie administrator for an organization, the **Estimated costs** section shows estimated costs for your organization overall for the current month in the current Region. To show the total estimated cost of the jobs that were run for a specific account, choose the account in the table. The **Estimated costs** section then shows a breakdown of estimated costs for the account, including the estimated cost of the jobs that were run. To show this data for a different account, choose the account in the table. To clear your account selection, choose **X** next to the account ID.

To review and monitor your actual costs, use [AWS Billing and Cost Management](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/billing-what-is.html).

# Managed data identifiers recommended for sensitive data discovery jobs
Managed data identifiers recommended for jobs

To optimize the results of your sensitive data discovery jobs, you can configure individual jobs to automatically use the set of managed data identifiers that we recommend for jobs. A *managed data identifier* is a set of built-in criteria and techniques that are designed to detect a specific type of sensitive data—for example, AWS secret access keys, credit card numbers, or passport numbers for a particular country or region.

The recommended set of managed data identifiers is designed to detect common categories and types of sensitive data. Based on our research, it can detect general categories and types of sensitive data while also optimizing your job results by reducing noise. As we release new managed data identifiers, we add them to this set if they're likely to further optimize your job results. Over time, we might also add or remove existing managed data identifiers from the set. If we add or remove a managed data identifier from the recommended set, we update this page to indicate the nature and timing of the change. For automatic alerts about these changes, you can subscribe to the RSS feed on the [Macie document history](doc-history.md) page.

When you create a sensitive data discovery job, you specify which managed data identifiers you want the job to use to analyze objects in Amazon Simple Storage Service (Amazon S3) buckets. To configure a job to use the recommended set of managed data identifiers, choose the *Recommended* option when you create the job. The job will then automatically use all the managed data identifiers that are in the recommended set when the job starts to run. If you configure a job to run more than once, each run will automatically use all the managed data identifiers that are in the recommended set when the run starts.

The following topics list the managed data identifiers that are currently in the recommended set, organized by sensitive data category and type. They specify the unique identifier (ID) for each managed data identifier in the set. This ID describes the type of sensitive data that a managed data identifier is designed to detect, for example: `PGP_PRIVATE_KEY` for PGP private keys and `USA_PASSPORT_NUMBER` for US passport numbers.

**Topics**
+ [Credentials](#discovery-jobs-mdis-recommended-credentials)
+ [Financial information](#discovery-jobs-mdis-recommended-financial)
+ [Personally identifiable information (PII)](#discovery-jobs-mdis-recommended-pii)
+ [Updates to the recommended set](#discovery-jobs-mdis-recommended-updates)

 For details about specific managed data identifiers or a complete list of all the managed data identifiers that Macie currently provides, see [Using managed data identifiers](managed-data-identifiers.md).

## Credentials
Credentials

To detect occurrences of credentials data in S3 objects, the recommended set uses the following managed data identifiers.


| Sensitive data type | Managed data identifier ID | 
| --- | --- | 
| AWS secret access key | AWS\$1CREDENTIALS | 
| HTTP Basic Authorization header | HTTP\$1BASIC\$1AUTH\$1HEADER | 
| OpenSSH private key | OPENSSH\$1PRIVATE\$1KEY | 
| PGP private key | PGP\$1PRIVATE\$1KEY | 
| Public Key Cryptography Standard (PKCS) private key | PKCS | 
| PuTTY private key | PUTTY\$1PRIVATE\$1KEY | 

## Financial information
Financial information

To detect occurrences of financial information in S3 objects, the recommended set uses the following managed data identifiers.


| Sensitive data type | Managed data identifier ID | 
| --- | --- | 
| Credit card magnetic stripe data | CREDIT\$1CARD\$1MAGNETIC\$1STRIPE | 
| Credit card number | CREDIT\$1CARD\$1NUMBER (for credit card numbers in proximity of a keyword) | 

## Personally identifiable information (PII)
Personally identifiable information (PII)

To detect occurrences of personally identifiable information (PII) in S3 objects, the recommended set uses the following managed data identifiers.


| Sensitive data type | Managed data identifier ID | 
| --- | --- | 
| Driver’s license identification number | CANADA\$1DRIVERS\$1LICENSE, DRIVERS\$1LICENSE (for the US), UK\$1DRIVERS\$1LICENSE | 
| Electoral roll number | UK\$1ELECTORAL\$1ROLL\$1NUMBER | 
| National identification number | FRANCE\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, GERMANY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, ITALY\$1NATIONAL\$1IDENTIFICATION\$1NUMBER, SPAIN\$1DNI\$1NUMBER | 
| National Insurance Number (NINO) | UK\$1NATIONAL\$1INSURANCE\$1NUMBER | 
| Passport number | CANADA\$1PASSPORT\$1NUMBER, FRANCE\$1PASSPORT\$1NUMBER, GERMANY\$1PASSPORT\$1NUMBER, ITALY\$1PASSPORT\$1NUMBER, SPAIN\$1PASSPORT\$1NUMBER, UK\$1PASSPORT\$1NUMBER, USA\$1PASSPORT\$1NUMBER | 
| Social Insurance Number (SIN) | CANADA\$1SOCIAL\$1INSURANCE\$1NUMBER | 
| Social Security number (SSN) | SPAIN\$1SOCIAL\$1SECURITY\$1NUMBER, USA\$1SOCIAL\$1SECURITY\$1NUMBER | 
| Taxpayer identification or reference number | AUSTRALIA\$1TAX\$1FILE\$1NUMBER, BRAZIL\$1CPF\$1NUMBER, FRANCE\$1TAX\$1IDENTIFICATION\$1NUMBER, GERMANY\$1TAX\$1IDENTIFICATION\$1NUMBER, SPAIN\$1NIE\$1NUMBER, SPAIN\$1NIF\$1NUMBER, SPAIN\$1TAX\$1IDENTIFICATION\$1NUMBER, USA\$1INDIVIDUAL\$1TAX\$1IDENTIFICATION\$1NUMBER | 

## Updates to the recommended set
Updates to the recommended set

The following table describes changes to the set of managed data identifiers that we recommend for sensitive data discovery jobs. For automatic alerts about these changes, subscribe to the RSS feed on the [Macie document history](doc-history.md) page.


| Change | Description | Date | 
| --- | --- | --- | 
|  General availability  |  Initial release of the recommended set.  |  June 27, 2023  | 