

# Text analysis API operations
<a name="comprehendmedical-textanalysis"></a>

Use Amazon Comprehend Medical to examine clinical documents and to gain various insights about their content using pre-trained natural language processing (NLP) models. You can perform analysis both on single files or as a batch analysis on multiple files stored in an Amazon Simple Storage Service (S3) bucket.

With Amazon Comprehend Medical, you can perform the following on your documents:
+ [Detect entities (Version 2)](textanalysis-entitiesv2.md) — Examine unstructured clinical text to detect textual references to medical information such as medical condition, treatment, tests and results, and medications. This version uses a different model than the original Detect entities API, and there are a few changes in the output.
+ [Detect PHI](textanalysis-phi.md) — Examine unstructured clinical text to detect textual references to protected health information (PHI) such as names and addresses.

Amazon Comprehend Medical also includes multiple API operations that you can use to perform batch text analysis on clinical documents. To learn more about how to use these API operations, see [Text analysis batch APIs](textanalysis-batchapi.md).

**Topics**
+ [Detect entities (Version 2)](textanalysis-entitiesv2.md)
+ [Detect PHI](textanalysis-phi.md)
+ [Text analysis batch APIs](textanalysis-batchapi.md)

# Detect entities (Version 2)
<a name="textanalysis-entitiesv2"></a>

Use the **DetectEntitiesV2** to detect entities in single files or **StartEntitiesDetectionV2Job** for batch analysis on multiple files. You can detect entities in the following categories:
+ `ANATOMY:` Detects references to the parts of the body or body systems and the locations of those parts or systems.
+ `BEHAVIORAL_ENVIRONMENTAL_SOCIAL`: Detects the behaviors and conditions in the environment that impact a person's health. This includes tobacco usage, alcohol consumption, recreational drug usage, allergies, gender, and race/ethnicity.
+ `MEDICAL_CONDITION:` Detects the signs, symptoms, and diagnoses of medical conditions.
+ `MEDICATION:` Detects medication and dosage information on the patient.
+ `PROTECTED_HEALTH_INFORMATION:` Detects the patient's personal information.
+ `TEST_TREATMENT_PROCEDURE:` Detects the procedures that are used to determine a medical condition.
+ `TIME_EXPRESSION:` Detects entities related to time when they are associated with a detected entity. 

All six categories are detected by the **DetectEntitiesV2** operation. For analysis specific to detecting PHI, use **DetectPHI** on single files and **StartPHIDetectionJob** for batch analysis. 

 Amazon Comprehend Medical detects information in the following classes:
+ *Entity:* A text reference to the name of relevant objects, such as people, treatments, medications, and medical conditions. For example, `ibuprofen`. 
+ *Category:* The generalized grouping to which an entity belongs. For example, ibuprofen is part of the `MEDICATION` category.
+ *Type:* The type of entity detected within a single category. For example, ibuprofen is in the `GENERIC_NAME` type in the `MEDICATION` category.
+ *Attribute:* Information related to an entity, such as the dosage of a medication. For example, `200 mg` is an attribute of the ibuprofen entity.
+ *Trait:* Something that Amazon Comprehend Medical understands about an entity, based on context. For example, a medication has the `NEGATION` trait if a patient is not taking it.
+ *Relationship Type:* The relationship between an entity and an attribute.

Amazon Comprehend Medical provides you the location of an entity in the input text. In the Amazon Comprehend console, it shows you the location graphically. When you use the API, it shows you the location by numerical offset.

Each entity and attribute includes a score that indicates the confidence level that Amazon Comprehend Medical has in the accuracy of the detection. Each attribute also has a relationship score. The score indicates the confidence level that Amazon Comprehend Medical has in the accuracy of the relationship between the attribute and its parent entity. Identify the appropriate confidence threshold for your use case. Use high-confidence thresholds in situations that require great accuracy. Filter out data that doesn't meet the threshold.

## Anatomy category
<a name="anatomy-v2"></a>

The `ANATOMY` category detects references to the parts of the body or body systems and the locations of those parts or systems. 

### Types
<a name="anatomy-type-v2"></a>
+ `SYSTEM_ORGAN_SITE`: Body systems, anatomic locations or regions, and body sites.

### Attributes
<a name="anatomy-attribute-v2"></a>
+ `DIRECTION`: Directional terms. For example, left, right, medial, lateral, upper, lower, posterior, anterior, distal, proximal, contralateral, bilateral, ipsilateral, dorsal, ventral, and so on.

## Behavioral, environmental, and social health category
<a name="behavioral-category-v2"></a>

The `BEHAVIORAL_ENVIRONMENTAL_SOCIAL` category detects references to behaviors and conditions in the environment that impact a person's health.

### Type
<a name="behavioral-type-v2"></a>
+ `ALCOHOL_CONSUMPTION`: Defines the patient’s alcohol consumption in terms of use status, frequency, amount, and duration.
+ `ALLERGIES`: Defines the patient’s allergies and responses to allergens.
+ `GENDER`: An identification of the characteristics of gender identity.
+ `RACE_ETHNICITY`: A social-political construct of a patient’s identification with particular racial and ethnic groups.
+ `REC_DRUG_USE`: Defines the patient’s use of recreational drugs in terms of use status, frequency, amount, and duration.
+ `TOBACCO_USE`: Defines the patient’s tobacco usage in terms of use status, frequency, amount, and duration.Attributes

The following detected attributes only apply to the types `ALCOHOL_CONSUMPTION`, `TOBACCO_USE`, and `REC_DRUG_USE`:
+ `AMOUNT`: The amount of alcohol, tobacco, or recreational drug used.
+ `DURATION`: How long the alcohol, tobacco, or recreational drug has been used.
+ `FREQUENCY`: How often the alcohol, tobacco, or recreational drug is used.

### Traits
<a name="behavioral-trait-v2"></a>

The following detected traits only apply to the types `ALCOHOL_CONSUMPTION`, `ALLERGIES`, `TOBACCO_USE`, and `REC_DRUG_USE`:
+ `NEGATION`: An indication that a result or action is negative or not being performed.
+ `PAST_HISTORY`: An indication that use of alcohol, tobacco, or recreational drugs is from the patient’s past (prior to the current encounter).

## Medical condition category
<a name="medical-condition-v2"></a>

The `MEDICAL_CONDITION` category detects the signs, symptoms, and diagnoses of medical conditions. The category has one entity type, four attributes, and four traits. One or more traits can be associated with a type. Contextual information about attributes and their relationship to the diagnosis is detected and mapped to the `DX_NAME` through `RELATIONSHIP_EXTRACTION.` For instance, from the text "chronic pain in left leg", "chronic" is detected as the attribute `ACUITY`, "left" is detected as the attribute `DIRECTION`, and "leg" is detected as the attribute `SYSTEM_ORGAN_SITE`. The relationships of each of these attributes are mapped to the medical condition entity "pain," along with a confidence score.

### Types
<a name="medical-condition-type-v2"></a>
+ `DX_NAME`: All medical conditions listed. The `DX_NAME` type includes present illness, reason for visit, and medical history.

### Attributes
<a name="medical-condition-attribute-v2"></a>
+ `ACUITY`: Determination of disease instance, such as chronic, acute, sudden, persistent, or gradual. 
+ `DIRECTION`: Directional terms. For example, left, right, medial, lateral, upper, lower, posterior, anterior, distal, proximal, contralateral, bilateral, ipsilateral, dorsal, or ventral.
+ `SYSTEM_ORGAN_SITE`: Anatomical location.
+ `QUALITY`: Any descriptive term of the medical condition, such as stage or grade.

### Traits
<a name="medical-condition-trait-v2"></a>
+ `DIAGNOSIS`: A medical condition that is determined as the cause or result of the symptoms. Symptoms can be found through physical findings, laboratory or radiological reports, or any other means.
+ `HYPOTHETICAL`: An indication that a medical condition is expressed as a hypothesis.
+ `LOW_CONFIDENCE`: An indication that a medical condition is expressed as having high uncertainty. This is not directly related to the confidence scores provided.
+ `NEGATION`: An indication that a result or action is negative or not being performed.
+ `PERTAINS_TO_FAMILY`: An indication that a medical condition is relevant to the patient’s family, not the patient.
+ `SIGN`: A medical condition that the physician reported.
+ `SYMPTOM`: A medical condition that the patient reported.

## Medication category
<a name="medication-v2"></a>

The `MEDICATION` category detects medication and dosage information for the patient. One or more attributes can apply to a type.

### Types
<a name="medication-type-v2"></a>
+ `BRAND_NAME`: The copyrighted brand name of the medication or therapeutic agent.
+ `GENERIC_NAME`: The non-brand name, ingredient name, or formula mixture of the medication or therapeutic agent.

### Attributes
<a name="medication-attribute-v2"></a>
+ `DOSAGE`: The amount of medication ordered.
+ `DURATION`: How long the medication should be administered.
+ `FORM`: The form of the medication.
+ `FREQUENCY`: How often to administer the medication. 
+ `RATE`: The administration rate of the medication (primarily for medication infusions or IVs).
+ `ROUTE_OR_MODE`: The administration method of the medication.
+ `STRENGTH`: The medication strength.

### Traits
<a name="medication-trait-v2"></a>
+ `NEGATION`: Any indication that the patient is not taking a medication.
+ `PAST_HISTORY`: An indication that a medication detected is from the patient’s past (prior to current encounter).

## Protected health information category
<a name="protected-health-information-v2"></a>

The `PROTECTED_HEALTH_INFORMATION` category detects the patient's personal information. See [Detect PHI](textanalysis-phi.md) to learn more about this operation.

### Types
<a name="protected-health-information-types-v2"></a>
+ `ADDRESS`: All geographical subdivisions of an address of any facility, units, or wards within a facility.
+ `AGE`: All components of age, spans of age, or any age mentioned. This includes those of a patient, family members, or others. The default is in years, unless otherwise noted.
+ `EMAIL`: Any email address.
+ `ID`: Social Security number, medical record number, facility identification number, clinical trial number, certificate or license number, vehicle or device number, the place of care, or provider. This also includes any biometric number of the patient, such as height, weight, or a lab value. 
+ `NAME`: All names. Typically, names of the patient, family, or provider.
+ `PHONE_OR_FAX`: Any phone, fax, or pager number. Excludes named phone numbers, such as 1-800-QUIT-NOW and 911.
+ `PROFESSION`: Any profession or employer that pertains to the patient or the patient's family. It does not include the profession of the clinician mentioned in the note. 

## Test, treatment, and procedure category
<a name="test-treatment-procedure-v2"></a>

The `TEST_TREATMENT_PROCEDURE` category detects the procedures that are used to determine a medical condition. One or more attributes can be related to an entity of the `TEST_NAME` type.

### Types
<a name="test-treatment-procedure-types-v2"></a>
+ `PROCEDURE_NAME`: Interventions as a one-time action performed on the patient to treat a medical condition or to provide patient care.
+ `TEST_NAME`: Procedures performed on a patient for diagnostic, measurement, screening, or rating that might have a resulting value. This includes any procedure, process, evaluation, or rating to determine a diagnosis, to rule out or find a condition, or to scale or score a patient.
+ `TREATMENT_NAME`: Interventions performed over a span of time for combating a disease or disorder. This includes groupings of medications, such as antivirals and vaccinations.

### Attributes
<a name="test-treatment-procedure-attributes-v2"></a>
+ `TEST_VALUE`: The result of a test. Applies only to the `TEST_NAME` entity type.
+ `TEST_UNIT`: The unit of measure that might accompany the value of the test. Applies only to the `TEST_NAME` entity type.

### Traits
<a name="test-treatment-procedure-traits-v2"></a>
+ `FUTURE`: An indication that a test, treatment, or procedure refers to an action or event that will occur after the subject of the notes.
+ `HYPOTHETICAL`: An indication that a test, treatment, or procedure is expressed as a hypothesis.
+ `NEGATION`: An indication that a result or action is negative or not being performed.
+ `PAST_HISTORY`: An indication that a test, treatment, or procedure is from the patient’s past (prior to current encounter).

## Time expression category
<a name="time-expression-v2"></a>

The `TIME_EXPRESSION` category detects entities related to time. This includes entities such as dates and time expressions such as "three days ago," "today," "currently," "day of admission," "last month," or "16 days." Results in this category are only returned if they are associated with an entity. For example, "Yesterday, the patient took 200 mg of ibuprofen" would return `Yesterday` as a `TIME_EXPRESSION` entity that overlaps with `GENERIC_NAME` entity "ibuprofen." However, it would not be recognized as an entity in "yesterday, the patient walked their dog." 

### Types
<a name="time-expression-v2-categories"></a>
+ `TIME_TO_MEDICATION_NAME`: The date a medication was taken. The attributes specific to this type are `BRAND_NAME` and `GENERIC_NAME`.
+ `TIME_TO_DX_NAME`: The date a medical condition occurred. The attribute for this type is `DX_NAME`. 
+ `TIME_TO_TEST_NAME`: The date a test was performed. The attribute for this type is `TEST_NAME`.
+ `TIME_TO_PROCEDURE_NAME`: The date a procedure was performed. The attribute for this type is `PROCEDURE_NAME`.
+ `TIME_TO_TREATMENT_NAME`: The date a treatment was administered. The attribute for this type is `TREATMENT_NAME`.

### Relationship type
<a name="time-expression-v2-relationship-type"></a>
+  The relationship between an entity and an attribute. The recognized `Relationship_type` is the following: 

  `Overlap` – The `TIME_EXPRESSION` concurs with the entity detected.

# Detect PHI
<a name="textanalysis-phi"></a>

Use the **DetectPHI** operation when you only want to detect Protected Health Information (PHI) data when scanning the clinical text. To detect all available entities in the clinical text use **DetectEntitiesV2**.

This API is best for a use case where only detecting PHI entities is required. For information about information in the non-PHI categories, see [Detect entities (Version 2)](textanalysis-entitiesv2.md).

**Important**  
 Amazon Comprehend Medical provides confidence scores that indicate the level of confidence in the accuracy of the detected entities. Evaluate these confidence scores and identify the right confidence threshold for your use case. For specific compliance use cases, we recommend that you use additional human review or other methods to confirm the accuracy of detected PHI.

Under the HIPAA act, PHI that is based on a list of 18 identifiers must be treated with special care. Amazon Comprehend Medical detects entities associated with these identifiers but these entities don't map 1:1 to the list specified by the Safe Harbor method. Not all identifiers are contained in unstructured clinical text, but Amazon Comprehend Medical does cover all the relevant identifiers. These identifiers consist of data that can be used to identify an individual patient, including the following list. For more information, see [Health Information Privacy](https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html) on the *U.S. Government Health and Human Services* website. 

Each PHI-related entity includes a score (`Score` in the response) that indicates the level of confidence Amazon Comprehend Medical has in the accuracy of the detection. Identify the right confidence threshold for your use case and filter out entities that do not meet it. When identifying occurrences of PHI, it may be better to use a low confidence threshold for filtering to capture more potential detected entities. This is especially true when not using the values of the detected entities in compliance use cases.

The following PHI-related entities can be detected by running the **DetectPHI** or **DetectEntitiesV2** operations:


**Detected PHI Entities**  

|  Entity  |  Description  |  HIPAA Category  | 
| --- | --- | --- | 
|  AGE  |  All components of age, spans of age, and any age mentioned, be it patient or family member or others involved in the note. Default is in years unless otherwise noted.  |  3. Dates related to an individual  | 
| DATE | Any date related to patient or patient care.  | 3. Dates related to an individual | 
|  NAME  |  All names mentioned in the clinical note, typically belonging to patient, family, or provider.  |  1. Name  | 
|  PHONE\$1OR\$1FAX  |  Any phone, fax, pager; excludes named phone numbers such as 1-800-QUIT-NOW as well as 911.  |  4. Phone number 5. FAX number  | 
|  EMAIL  |  Any email address.  |  6. Email addresses  | 
|  ID  |  Any sort of number associated with the identity of a patient. This includes their social security number, medical record number, facility identification number, clinical trial number, certificate or license number, vehicle or device number. It also includes biometric numbers, and numbers identifying the place of care or provider.  |  7. Social Security Number  8. Medical Record number 9. Health Plan number 10. Account numbers 11. Certificate/License numbers 12. Vehicle identifiers 13. Device numbers 16. Biometric information 18. Any other identifying characteristics  | 
|  URL  |  Any web URL.  |  14. URLs  | 
|  ADDRESS  |  This includes all geographical subdivisions of an address of any facility, named medical facilities, or wards within a facility.  |  2. Geographic location  | 
|  PROFESSION  |  Includes any profession or employer mentioned in a note as it pertains to the patient or the patient’s family.  |  18. Any other identifying characteristics  | 



**Example**  


The text "Patient is John Smith, a 48-year-old teacher and resident of Seattle, Washington." returns:
+ "John Smith" as an *entity* of type `NAME` in the `PROTECTED_HEALTH_INFORMATION` category.
+ "48" as an *entity* of type `AGE` in the `PROTECTED_HEALTH_INFORMATION` category.
+ "teacher" as an *entity* of type `PROFESSION` (identifying characteristic) in the `PROTECTED_HEALTH_INFORMATION` category.
+ "Seattle, Washington" as an `ADDRESS` *entity* in the `PROTECTED_HEALTH_INFORMATION` category.

In the Amazon Comprehend Medical console, this is shown like this:

![\[Patient information card displaying name, age, profession, and address details.\]](http://docs.aws.amazon.com/comprehend-medical/latest/dev/images/patient.png)


When using the **DetectPHI** operation, the response appears like this. When you use the **StartPHIDetectionJob** operation, Amazon Comprehend Medical creates a file in the output location with this structure.

```
{
    "Entities": [
        {
            "Id": 0,
            "BeginOffset": 11,
            "EndOffset": 21,
            "Score": 0.997368335723877,
            "Text": "John Smith",
            "Category": "PROTECTED_HEALTH_INFORMATION",
            "Type": "NAME",
            "Traits": []
        },
        {
            "Id": 1,
            "BeginOffset": 25,
            "EndOffset": 27,
            "Score": 0.9998362064361572,
            "Text": "48",
            "Category": "PROTECTED_HEALTH_INFORMATION",
            "Type": "AGE",
            "Traits": []
        },
        {
            "Id": 2,
            "BeginOffset": 37,
            "EndOffset": 44,
            "Score": 0.8661606311798096,
            "Text": "teacher",
            "Category": "PROTECTED_HEALTH_INFORMATION",
            "Type": "PROFESSION",
            "Traits": []
        },
        {
            "Id": 3,
            "BeginOffset": 61,
            "EndOffset": 68,
            "Score": 0.9629441499710083,
            "Text": "Seattle",
            "Category": "PROTECTED_HEALTH_INFORMATION",
            "Type": "ADDRESS",
            "Traits": []
        },
        {
            "Id": 4,
            "BeginOffset": 78,
            "EndOffset": 88,
            "Score": 0.38217034935951233,
            "Text": "Washington",
            "Category": "PROTECTED_HEALTH_INFORMATION",
            "Type": "ADDRESS",
            "Traits": []
        }
    ],
    "UnmappedAttributes": []
}
```

# Text analysis batch APIs
<a name="textanalysis-batchapi"></a>

Use Amazon Comprehend Medical to analyze medical text stored in an Amazon S3 bucket. Analyze up to 10 GB of documents in one batch. You use the console to create and manage batch analysis jobs, or use batch APIs to detect medical entities, including protected health information (PHI). The APIs start, stop, list, and describe ongoing batch analysis jobs.

 Pricing information for batch analysis and other Amazon Comprehend Medical operations can be found [here](https://aws.amazon.com/comprehend/medical/pricing/).

## Important notice
<a name="important-notice"></a>

The batch analysis operations of Amazon Comprehend Medical are not a substitute for professional medical advice, diagnosis, or treatment. Identify the right confidence threshold for your use case, and use high confidence thresholds in situations that require high accuracy. For certain use cases, results should be reviewed and verified by appropriately trained human reviewers. All operations of Amazon Comprehend Medical should only be used in patient care scenarios after review for accuracy and sound medical judgment by trained medical professionals.

## Performing batch analysis using the APIs
<a name="performing-batch-api"></a>

You can run a batch analysis job using either the Amazon Comprehend Medical console or the Amazon Comprehend Medical Batch APIs.

**Prerequisites**

 When you are using the Amazon Comprehend Medical API, create an AWS Identity Access and Management (IAM) policy and attach it to an IAM role. To learn more about IAM roles and trust policies, see [IAM Policies and Permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html). 

****

1. Upload your data into an S3 bucket.

1. To start a new analysis job, use either the StartEntitiesDetectionV2Job operation or the StartPHIDetectionJob operation. When you start the job, tell Amazon Comprehend Medical the name of the input S3 bucket that contains the input files and designate the output S3 bucket to write the files after batch analysis.

1. Monitor the progress of the job by using the console or the DescribeEntitiesDetectionV2Job operation or the DescribePHIDetectionJob operation. Additionally, ListEntitiesDetectionV2Jobs and ListPHIDetectionJobs enable you to see the status of all ontology linking batch analysis jobs.

1. If you need to stop a job in progress, use StopEntitiesDetectionV2Job or StopPHIDetectionJob to stop analysis.

1. To view the results of your analysis job, see the output S3 bucket that you configured when you started the job.

## Performing batch analysis using the console
<a name="batch-api-console"></a>

****

1. Upload your data into an S3 bucket.

1. To start a new analysis job, select the type of analysis you will be performing. Then provide the name of the S3 bucket that contains the input files and the name of the S3 bucket where you want to send the output files.

1. Monitor the status of your job while it is ongoing. From the console, you are can view all batch analysis operations and their status, including when analysis was started and ended.

1. To see the results of your analysis job, see the output S3 bucket that you configured when you started the job. 

## IAM Policies for batch operations
<a name="batch-iam"></a>

The IAM role that calls the Amazon Comprehend Medical batch APIs must have a policy that grants access to the S3 buckets that contain the input and output files. It must also be assigned a trust relationship that enables the Amazon Comprehend Medical service to assume the role. To learn more about IAM roles and trust policies, see [IAM Roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html).

The role must have the following policy.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::input-bucket/*"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::input-bucket",
                "arn:aws:s3:::output-bucket"
            ],
            "Effect": "Allow"
        },
        {
            "Action": [
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::output-bucket/*"
            ],
            "Effect": "Allow"
        }
    ]
}
```

------

The role must have the following trust relationship. It is recommended that you use the `aws:SourceAccount ` and `aws:SourceArn` condition keys to prevent the confused deputy security issue. To learn more about the confused deputy problem and how to protect your AWS account, see [The confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html) in the IAM documentation.

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Principal":{
            "Service":[
               "comprehendmedical.amazonaws.com"
            ]
         },
         "Action":"sts:AssumeRole",
         "Condition": {
            "StringEquals": {
               "aws:SourceAccount": "account_id"
            },
            "ArnLike": {
               "aws:SourceArn": "arn:aws:comprehendmedical:us-east-1:account_id:*"
            }
         }
      }
   ]
}
```

------

## Batch analysis output files
<a name="batch-ouput"></a>

Amazon Comprehend Medical creates one output file for each input file in the batch. The file has the extension `.out`. Amazon Comprehend Medical first creates a directory in the output S3 bucket using the *AwsAccountId*-*JobType*-*JobId* as the name, and then writes all of the output files for the batch to this directory. Amazon Comprehend Medical creates this new directory so that output from one job does not overwrite the output of another.

The output from a batch operation produces the same output as a synchronous operation. For examples of the output generated by Amazon Comprehend Medical, see [Detect entities (Version 2)](textanalysis-entitiesv2.md).

Each batch operation produces three manifest files that contain information about the job. 
+ `Manifest` – Summarizes the job. Provides information about the parameters used for the job, the total size of the job, and the number of files processed.
+ `success` – Provides information about the files that were successfully processed. Includes the input and output file name and the size of the input file.
+ `unprocessed` – Lists files the batch job did not process, including error codes and error messages per file.

Amazon Comprehend Medical writes the files to the output directory that you specified for the batch job. The summary manifest file will be written to the output folder, along with a folder titled `Manifest_AccountId-Operation-JobId`. Within the manifest folder is a `success` folder, which contains the success manifest. Also included is a `failed` folder, which contains the unprocessed file manifest. The following sections show the structure of the manifest files.

### Batch manifest file
<a name="batch-manifest"></a>

The following is the JSON structure of the batch manifest file.

```
{"Summary" : 
    {"Status" : "COMPLETED | FAILED | PARTIAL_SUCCESS | STOPPED", 
    "JobType" : "EntitiesDetection | PHIDetection", 
    "InputDataConfiguration" : {
        "Bucket" : "input bucket", 
        "Path" : "path to files/account ID-job type-job ID" 
    }, "OutputDataConfiguration" : {
        "Bucket" : "output bucket", 
        "Path" : "path to files" 
    }, 
    "InputFileCount" : number of files in input bucket, 
    "TotalMeteredCharacters" : total characters processed from all files, 
    "UnprocessedFilesCount" : number of files not processed, 
    "SuccessFilesCount" : total number of files processed, 
    "TotalDurationSeconds" : time required for processing, 
    "SuccessfulFilesListLocation" : "path to file", 
    "UnprocessedFilesListLocation" : "path to file",
    "FailedJobErrorMessage": "error message or if not applicable,
              The status of the job is completed"
    } 
}
```

### Success manifest file
<a name="batch-success"></a>

The following is the JSON structure of the file that contains information about successfully processed files.

```
{
        "Files": [{
               "Input": "input path/input file name",
               "Output": "output path/output file name",
               "InputSize": size in bytes of input file
        }, {
               "Input": "input path/input file name",
               "Output": "output path/output file name",
               "InputSize": size in bytes of input file
        }]
}
```

### Unprocessed manifest file
<a name="batch-unprocessed"></a>

The following is the JSON structure of the manifest file that contains information about unprocessed files.

```
{
  "Files" : [ {
      "Input": "file_name_that_failed",
      "ErrorCode": "error code for exception",
      "ErrorMessage": "explanation of the error code and suggestions"
  }, 
  { ...}
  ]
}
```