

# Plain-text annotation files
<a name="cer-annotation-csv"></a>

For plain-text annotations, you create a comma-separated value (CSV) file that contains a list of annotations. The CSV file must contain the following columns if your training file input format is **one document per line**.


| File | Line | Begin offset | End offset | Type | 
| --- | --- | --- | --- | --- | 
|  The name of the file containing the document. For example, if one of the document files is located at `s3://my-S3-bucket/test-files/documents.txt`, the value in the `File` column will be `documents.txt`. You must include the file extension (in this case '`.txt`') as part of the file name.  |  The line number containing the entity. Omit this column if your input format is one document per file.  |  The character offset in the input text (relative to the beginning of the line) that shows where the entity begins. The first character is at position 0.  |  The character offset in the input text that shows where the entity ends.  |  The customer-defined entity type. Entity types must be an uppercase, underscore-separated string. We recommend using descriptive entity types such as `MANAGER`, `SENIOR_MANAGER`, or `PRODUCT_CODE`. Up to 25 entity types can be trained per model.  | 

If your training file input format is **one document per file**, you omit the line number column and the **Begin offset** and **End offset** values are the offsets of the entity from the start of the document.

The following example is for one document per line. The file `documents.txt` contains four lines (rows 0, 1, 2, and 3):

```
Diego Ramirez is an engineer in the high tech industry.
Emilio Johnson has been an engineer for 14 years.
J Doe is a judge on the Washington Supreme Court.
Our latest new employee, Mateo Jackson, has been a manager in the industry for 4 years.
```

The CSV file with the list of annotations is as follows: 

```
File, Line, Begin Offset, End Offset, Type
documents.txt, 0, 0, 13, ENGINEER
documents.txt, 1, 0, 14, ENGINEER
documents.txt, 3, 25, 38, MANAGER
```

**Note**  
In the annotations file, the line number containing the entity starts with line 0. In this example, the CSV file contains no entry for line 2 because there is no entity in line 2 of `documents.txt`.

**Creating your data files**

It's important to put your annotations in a properly configured CSV file to reduce the risk of errors. To manually configure your CSV file, the following must be true:
+ UTF-8 encoding must be explicitly specified, even if its used as a default in most cases.
+ The first line contains the column headers: `File`, `Line` (optional), `Begin Offset`, `End Offset`, `Type`.

We highly recommended that you generate the CSV input files programmatically to avoid potential issues.

The following example uses Python to generate a CSV for the annotations shown earlier:

```
import csv 
with open("./annotations/annotations.csv", "w", encoding="utf-8") as csv_file:
    csv_writer = csv.writer(csv_file)
    csv_writer.writerow(["File", "Line", "Begin Offset", "End Offset", "Type"])
    csv_writer.writerow(["documents.txt", 0, 0, 11, "ENGINEER"])
    csv_writer.writerow(["documents.txt", 1, 0, 5, "ENGINEER"])
    csv_writer.writerow(["documents.txt", 3, 25, 30, "MANAGER"])
```