

# Preparing training data for Amazon Personalize
Preparing training data

After you [choose a domain use case or recipe](use-cases-and-recipes.md) and note its data requirements, you are ready to start preparing your data. Amazon Personalize can use the following types of data:
+ [**Item interactions**](interactions-datasets.md) – In Amazon Personalize, an *item interaction* is a positive interaction event between a user and an item in your catalogue. For example, a user watching a movie, viewing a listing, or purchasing a pair of shoes.
+ [**Items**](items-datasets.md) – Item metadata might include information such as price, SKU type, description, or availability for each item in your catalog.
+ [**Users**](users-datasets.md) – User metadata might include information such as age, gender, loyalty membership, and interest for each of your users.
+ [**Actions**](actions-datasets.md) – An *action* is an engagement activity that you might want to recommend to your customers. Actions might include installing your mobile app, completing a membership profile, joining your loyalty program, or signing up for promotional emails. For the Next-Best-Action recipe, the Actions dataset is required. No other custom recipe or domain use case uses Actions data. 
+ [**Action interactions**](action-interactions-datasets.md) – An action interaction is an interaction event between a user and an action. The Next-Best-Action recipe uses this data and the data in your Actions dataset to recommend actions to your users. No other custom recipe or domain use case uses Action-interactions data. 

Amazon Personalize stores data in *datasets*, one for each type of data. Each dataset has different requirements. When you import data into an Amazon Personalize dataset, you can choose to import records in bulk, individually, or both. Bulk imports involve importing a large number of historical records stored in one or more CSV files in an Amazon S3 bucket.
+ If you don't have bulk data, you can use individual import operations to collect data and stream events until you meet Amazon Personalize training requirements and the data requirements of your domain use case or recipe. For information about recording events, see [Recording real-time events to influence recommendations](recording-events.md). For information about importing individual records, see [Importing individual records into an Amazon Personalize dataset](incremental-data-updates.md). 
+ If you aren't sure you have enough data or if you have questions about its quality, you can import your data into an Amazon Personalize dataset and use Amazon Personalize to analyze it. For more information, see [Analyzing quality and quantity of data in Amazon Personalize datasets](analyzing-data.md).

 The following sections provide data requirements for each Amazon Personalize dataset type and guidelines for preparing bulk data. If you don't have bulk data, review the sections to understand the required and optional data you can import with individual import operations. If you need additional help formatting your data, you can use Amazon SageMaker AI Data Wrangler (Data Wrangler) to prepare your data. For more information, see [Preparing and importing bulk data using Amazon SageMaker AI Data Wrangler](preparing-importing-with-data-wrangler.md).

After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md). 

**Topics**
+ [

## Bulk data format guidelines for all types of data
](#general-formatting-guidelines)
+ [

# Preparing item interaction data for training
](interactions-datasets.md)
+ [

# Preparing item metadata for training
](items-datasets.md)
+ [

# Preparing user metadata for training
](users-datasets.md)
+ [

# Preparing action metadata for training
](actions-datasets.md)
+ [

# Preparing action interaction data for training
](action-interactions-datasets.md)

## Bulk data format guidelines for all types of data


The following guidelines and requirements can help you make sure your bulk data is formatted correctly.
+ Your input data must be in a CSV (comma-separated values) file. 
+ The first row of your CSV file must contain your column headers. Don't enclose headers in quotation marks ("). 
+  Columns must have unique alphanumeric names. For example, you can't add both a `GENRES_FIELD_1` field and a `GENRESFIELD1` field. 
+ If you are imporitng multiple CSV files, all column headers must match across all files. 
+ Make sure you have the required fields for your dataset type and make sure that their names align with Amazon Personalize requirements. For example, your Items data might have a column called `ITEM_IDENTIFICATION_NUMBER` with IDs for each of your items. To use this column as an ITEM\$1ID field, rename the column to `ITEM_ID`. If you use Data Wrangler to format your data, you can use the **Map columns for Amazon Personalize** Data Wrangler transform to make sure your columns are named correctly.

   For information about using Data Wrangler to prepare your data, see [Preparing and importing bulk data using Amazon SageMaker AI Data Wrangler](preparing-importing-with-data-wrangler.md).
+  Each record in your CSV file must be on a single line. 
+ Amazon Personalize doesn't support complex data types such as arrays and maps.
+ To have Amazon Personalize use boolean data when training or filtering, use string values `"True"` and `"False"` or numeric values `1` for true and `0` for false. 
+ If you use Data Wrangler to format your data, you can use the Data Wrangler transform [Parse Value as Type](https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-cast-type) to convert the data types.
+ `TIMESTAMP` and `CREATION_TIMESTAMP` data must be in *UNIX epoch* time format. For more information, see [Timestamp data](interactions-datasets.md#timestamp-data).
+ Avoid including any `"` characters or special characters in item ID, user ID, and action ID data.
+ If your data includes any non-ASCII encoded characters, your CSV file must be encoded in UTF-8 format.
+ Makes sure you format any textual data as described in [Unstructured text metadata](items-datasets.md#text-data).

# Preparing item interaction data for training
Item interaction data

 An *item interaction* is a positive interaction event between a user and an item in your catalogue. For example, a user watching a movie, viewing a listing, or purchasing a pair of shoes. You import data about your users' interactions with your items into a *Item interactions dataset*. You can record multiple event types, such as *click*, *watch* or *purchase*. 

For example, if a user *clicks* a particular item and then *likes* the item, you can have Amazon Personalize use these events as training data. For each event, you would record the user's ID, the item's ID, the timestamp (in Unix time epoch format), and the event type (*click* and *like*). You would then add both item interaction events to an *Item interactions dataset*.

For all domain use cases and custom recipes, your bulk item interactions data must be in a CSV file. Each row should represent a single interaction between a user and an item. After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md).

The following sections provide more information on how to prepare your item interaction data for Amazon Personalize. For bulk data format guidelines for all types of data, see [bulk data format guidelines](preparing-training-data.md#general-formatting-guidelines)

**Topics**
+ [

## Item interaction data requirements
](#item-interaction-requirements)
+ [

## Timestamp data
](#timestamp-data)
+ [

## Event type and event value data
](#event-type-and-event-value-data)
+ [

## Contextual metadata
](#interactions-contextual-metadata)
+ [

## Impressions data
](#interactions-impressions-data)
+ [

## Interactions data example
](#interactions-data-schema-example)

## Item interaction data requirements


The following sections list item interaction data requirements for Amazon Personalize. For additional quotas, see [Amazon Personalize endpoints and quotas](limits.md).



### Minimum training requirements


For all domain use cases and custom recipes, your bulk item interactions data must have the following: 
+ At minimum 1000 item interactions records from users interacting with items in your catalog. These interactions can be from bulk imports, or streamed events, or both.
+ At minimum 25 unique user IDs with at least two item interactions for each.

 For quality recommendations, we recommend that you have at minimum 50,000 item interactions from at least 1,000 users with two or more item interactions each. 

 To create a recommender or a custom solution, you must at minimum create an *Item interactions dataset*. 

### Column requirements


Your item interactions data must have the following columns.
+ USER\$1ID – The unique identifier of the user who interacted with the item. Every event must have an USER\$1ID. It must be a `string` with a max length of 256 characters.
+ ITEM\$1ID – The unique identifier of the item that the user interacted with. Every event must have an item ID. It must be a `string` with a max length of 256 characters.
+ TIMESTAMP – The time the event occurred (in Unix epoch time format in seconds). Every interaction must have an TIMESTAMP. For more information, see [Timestamp data](#timestamp-data).
+ EVENT\$1TYPE – The nature of item interaction event, such as *click*, *watch* or *purchase*. For domain recommenders, you must have an event type column and every interaction must have an event type. For all custom recipes, an EVENT\$1TYPE column is recommended but optional. If you add it, every event must have an event type. For more information see [Event type and event value data](#event-type-and-event-value-data). 

You are free to add additional custom columns depending on your use case and your data. The maximum number of optional metadata columns is 5. These columns can include empty/null values. We recommend that these columns be at minimum 70 percent complete.

## Timestamp data


 Timestamp data must be in Unix epoch time format in seconds. For example, the Epoch timestamp in seconds for date July 31, 2020 is 1596238243. To convert dates to Unix epoch timestamps, use an [Epoch converter - Unix timestamp converter](https://www.epochconverter.com). 

Amazon Personalize uses timestamp data to calculate recency and identify any time-based patterns. It helps Amazon Personalize keep recommendations up-to-date with users' evolving preferences.

## Event type and event value data


An Item interactions dataset can store event type and event value data for each interaction. Only custom resources use event value data.

### Event type data


An item interaction's event type provides context about its nature and significance. Event type examples might be *click*, *watch* or *purchase*. Amazon Personalize uses event type data, such as *click* or *purchase* data, to identify user intent and interest. The maximum number of distinct event types combined with total number of optional metadata columns in an Item interactions dataset is 10. 

For domain recommenders, you must have an event type column and every interaction must have an event type. For all custom recipes, an EVENT\$1TYPE column is recommended but optional. If you add it, every event must have an event type.

If you create custom resources, you can choose the events used for training by event type.If your dataset has multiple event types in an EVENT\$1TYPE column, and you do not provide an event type when you configure a custom solution, Amazon Personalize uses all item interactions data for training with equal weight regardless of type. For more information, see [Choosing the item interaction data used for training](event-values-types.md).

If you have multiple event types and use the User-Personalization-v2 recipe or Personalized-Ranking-v2 recipe, when you configure a custom solution you can specify different weights for different types. For example, you can configure a solution to give more weight to purchase events than click events. For more information, see [Optimizing a solution with events configuration](optimizing-solution-events-config.md).

The following use cases have specific event type requirements: 

VIDEO\$1ON\$1DEMAND domain use cases
+ Because you watched X requires at minimum 1000 `Watch` events. 
+ Most popular requires at minimum 1000 `Watch` events. 

ECOMMERCE domain use cases
+ Most viewed requires at minimum 1000 `View` events. 
+ Best sellers requires at minimum 1000 `Purchase` events. 

#### Positive and negative event types


 Amazon Personalize assumes any interaction is a positive one. Interactions with a negative event type, such as *dislike*, won't necessarily keep the item from appearing in the user's future recommendations.

The following are ways to have negative events and users' disinterest influence recommendations:
+  For all domain use cases and the [User-Personalization](native-recipe-new-item-USER_PERSONALIZATION.md) recipe, Amazon Personalize can use impressions data. When an item appears in impressions data and a user doesn't choose it, the item is less likely to appear in recommendations. For more information, see [Impressions data](#interactions-impressions-data). 
+ If you use custom resources and import positive and negative event types, you can train on only positive event types and then filter out items the user interacted with negatively. For more information, see [Choosing the item interaction data used for training](event-values-types.md) and [Filtering recommendations and user segments](filter.md). 

### Event value data (custom resources)


 Event value data might be the percentage of a movie that a user watched or a rating out of 10. If you create custom solutions, you can choose records used for training based on data in EVENT\$1TYPE and EVENT\$1VALUE columns. With domain recommenders, Amazon Personalize doesn't use event value data and you can't filter events before training. 

To choose records based on type and value, record event type and event value data for events. Not all events must have an event value. The value you choose for each event depends on what data you want to exclude and what event types you are recording. For example, you might match the user activity, such as the percentage of video the user watched for *watch* event types. 

 When you configure a solution, you set a specific value as a threshold to exclude records from training. For example, if your EVENT\$1VALUE data for events with an EVENT\$1TYPE of *watch* is the percentage of a video that a user watched, if you set the event value threshold to 0.5, and the event type to *watch*, Amazon Personalize trains the model using only *watch* interaction events with an EVENT\$1VALUE greater than or equal to 0.5. 

 For more information, see [Choosing the item interaction data used for training](event-values-types.md) 

## Contextual metadata


 With certain recipes and recommender use cases, Amazon Personalize can use contextual metadata when identifying underlying patterns that reveal the most relevant items for your users. Contextual metadata is interactions data you collect on the user's environment at the time of an event, such as their location or device type. You can also specify a user's context when you get recommendations for the user. 

Include contextual metadata to provide a more personalized experience for your users and decrease the cold-start phase for new users. The cold-start phase is when recommendations are less relevant due to a lack of historical user data.

 For example, if your item interactions CSV file includes a DEVICE\$1TYPE column with `tablet` and `phone` values, Amazon Personalize can learn how customers shop differently with different devices. When you get recommendations for a user, you can specify their device and recommendations will be more relevant, even if the user has no interaction history. 

 The following shows how you would format a item interactions CSV file with a DEVICE\$1TYPE column as contextual metadata.

```
ITEM_ID,USER_ID,TIMESTAMP,DEVICE_TYPE,EVENT_TYPE
shoe12345,12,1428624000,Tablet,CLICK
shoe12346,12,1420416000,Tablet,CLICK
shoe12347,12,1410652800,Tablet,BUY
shoe4444,13,1409961600,Phone,CLICK
shoe4445,13,1402876800,Phone,BUY
shoe4336,13,1402185600,Phone,CLICK
.....
```

For Domain dataset groups, the following recommender use cases can use contextual metadata:
+ [Recommended for you](ECOMMERCE-use-cases.md#recommended-for-you-use-case) (ECOMMERCE domain)
+ [Top picks for you](VIDEO_ON_DEMAND-use-cases.md#top-picks-use-case) (VIDEO\$1ON\$1DEMAND domain)

 For custom resources, recipes that use contextual metadata include the following:
+  [User-Personalization-v2](native-recipe-user-personalization-v2.md) and [User-Personalization](native-recipe-new-item-USER_PERSONALIZATION.md) 
+  [Personalized-Ranking-v2](native-recipe-personalized-ranking-v2.md) and [Personalized-Ranking](native-recipe-search.md)

For information about including context when you get recommendations, see [Increasing recommendation relevance with contextual metadata](contextual-metadata.md). For an end to end example that shows how to use contextual metadata, see the following AWS Machine Learning Blog post: [ Increasing the relevance of your Amazon Personalize recommendations by leveraging contextual information](https://aws.amazon.com/blogs/machine-learning/increasing-the-relevance-of-your-amazon-personalize-recommendations-by-leveraging-contextual-information/). 

## Impressions data


Impressions are lists of items that were visible to a user when they interacted with (for example, clicked or watched) a particular item. If you use a domain use case that provides personalization or the [User-Personalization](native-recipe-new-item-USER_PERSONALIZATION.md) recipe, Amazon Personalize can use impressions data to guide exploration.

 With exploration, recommendations include some items or actions that would be typically less likely to be recommended for the user, such as new items or actions, items or actions with few interactions, or items or actions less relevant for the user based on their previous behavior. The more frequently an item occurs in impressions data, the less likely it is that Amazon Personalize includes the item in exploration. 

 When you create a recommender or solution, Amazon Personalize always excludes impressions data from training. This is because Amazon Personalize doesn't train your models with impressions data. Instead, it uses it when you get recommendations to guide exploration for the user.

 Impression values can have at most 1000 characters (including the vertical bar character). For Domain dataset groups, the following recommender use cases can use impressions data:
+ [Recommended for you](ECOMMERCE-use-cases.md#recommended-for-you-use-case) (ECOMMERCE domain)
+ [Top picks for you](VIDEO_ON_DEMAND-use-cases.md#top-picks-use-case) (VIDEO\$1ON\$1DEMAND domain)

For more information about exploration see [Exploration](use-case-recipe-features.md#about-exploration). Amazon Personalize can model two types of impressions: [Implicit impressions](#implicit-impressions-info) and [Explicit impressions](#explicit-impressions-info). 

### Explicit impressions


*Explicit impressions* are impressions that you manually record and send to Amazon Personalize. Use explicit impressions to manipulate results from Amazon Personalize. The order of the items has no impact. 

 For example, you might have a shopping application that provides recommendations for shoes. If you only recommend shoes that are currently in stock, you can specify these items using explicit impressions. Your recommendation workflow using explicit impressions might be as follows:

1. You request recommendations for one of your users using the Amazon Personalize [GetRecommendations](API_RS_GetRecommendations.md) API.

1. Amazon Personalize generates recommendations for the user using your model (solution version) and returns them in the API response.

1. You show the user only the recommended shoes that are in stock.

1. For real-time incremental data import, when your user interacts with (for example, clicks) a pair of shoes, you record the choice in a call to the [PutEvents](API_UBS_PutEvents.md) API and list the recommended items that are in stock in the `impression` parameter. For a code sample see [Recording item interaction events with impressions data](putevents-including-impressions-data.md).

   For importing impressions in historical item interactions data, you can list explicit impressions in your csv file and separate each item with a '\$1' character. The vertical bar character counts towards the 1000 character limit. For an example see [Formatting explicit impressions](#data-prep-including-explicit-impressions).

1. Amazon Personalize uses the impression data to guide exploration, where future recommendations include new shoes with less interactions data or relevance. 

#### Formatting explicit impressions


To include explicit impressions in your CSV file, add an IMPRESSION column. For each item interaction, add list of itemIds separated with a vertical bar, '\$1', character. The vertical bar character counts toward the 1000 character limit for impressions data. If you include explicit impressions in [PutEvents](API_UBS_PutEvents.md) operation, you specify the items in an array of strings. 

The following is a short excerpt from a CSV file that includes explicit impressions in the `IMPRESSION` column.


| EVENT\$1TYPE | IMPRESSION | ITEM\$1ID | TIMESTAMP | USER\$1ID | 
| --- | --- | --- | --- | --- | 
| click |  73\$170\$117\$195\$196  | 73 |  1586731606  | USER\$11 | 
| click |  35\$182\$178\$157\$120\$163\$11\$190\$176\$175\$149\$171\$126\$124\$125\$16  | 35 |  1586735164  | USER\$12 | 
| ... | ... | ... | ... | ... | 

The application showed user `USER_1` items `73`, `70`, `17`, `95`, and `96` and the user ultimately chose item `73`. When you create a new solution version based on this data, items `70`, `17`, `95`, and `96` will be less frequently recommended to user `USER_1`.

### Implicit impressions


*Implicit impressions* are the recommendations, retrieved from Amazon Personalize, that you show the user. Your CSV file doesn't need to include IMPRESSION or RECOMMENDATION\$1ID columns to use implicit impressions. Instead, you include the `RecommendationId` (returned by the [GetRecommendations](API_RS_GetRecommendations.md) and [GetPersonalizedRanking](API_RS_GetPersonalizedRanking.md) operations) in [PutEvents](API_UBS_PutEvents.md) requests. Amazon Personalize derives the implicit impressions based on your recommendation data. 

 For example, you might have an application that provides recommendations for streaming video. Your recommendation workflow using implicit impressions might be as follows:

1. You request video recommendations for one of your users using the Amazon Personalize [GetRecommendations](API_RS_GetRecommendations.md) API operation.

1. Amazon Personalize generates recommendations for the user using your model (solution version) and returns them with a `recommendationId` in the API response.

1. You show the video recommendations to your user in your application.

1. When your user interacts with (for example, clicks) a video, record the choice in a call to the [PutEvents](API_UBS_PutEvents.md) API and include the `recommendationId` as a parameter. For a code sample see [Recording item interaction events with impressions data](putevents-including-impressions-data.md).

1. Amazon Personalize uses the `recommendationId` to derive the impression data from the previous video recommendations, and then uses the impression data to guide exploration, where future recommendations include new videos with less interactions data or relevance. 

   For more information on recording events with implicit impression data, see [Recording item interaction events with impressions data](putevents-including-impressions-data.md).

## Interactions data example


The following interactions data represents historical user activity from a streaming video website. You might use the data to train a model that provides movie recommendations based on users' interaction data. Note that some values for EVENT\$1VALUE are null.

```
USER_ID,ITEM_ID,EVENT_TYPE,EVENT_VALUE,TIMESTAMP
196,242,watch,.50,881250949
186,302,watch,.75,891717742
22,377,click,,878887116
244,51,click,,880606923
166,346,watch,.50,886397596
298,474,watch,.25,884182806
115,265,click,,881171488
253,465,watch,.50,891628467
305,451,watch,.75,886324817
```

Amazon Personalize requires the `USER_ID`, `ITEM_ID`, and `TIMESTAMP` column. `USER_ID` is the identifier for a user of your application. `ITEM_ID` is the identifier for a movie. `EVENT_TYPE` and `EVENT_VALUE` are the identifiers for user interactions. In the sample data, the events are `watch` and `click` events and the values are the percentage of a video that a user watched. The `TIMESTAMP` represents the Unix epoch time that the movie purchase took place.

After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md). This is what the schema JSON file would look like for the sample data.

```
{
  "type": "record",
  "name": "Interactions",
  "namespace": "com.amazonaws.personalize.schema",
  "fields": [
    {
      "name": "USER_ID",
      "type": "string"
    },
    {
      "name": "ITEM_ID",
      "type": "string"
    },
    { "name": "EVENT_TYPE",
      "type": "string"
    },
    {
      "name": "EVENT_VALUE",
      "type": "float"
    },
    {
      "name": "TIMESTAMP",
      "type": "long"
    }
  ],
  "version": "1.0"
}
```

# Preparing item metadata for training
Item metadata

 Item metadata includes numerical and categorical data about the items your users interact with. Examples of item metadata include creation timestamp, price, genre, description, and availability. You import metadata about your items into an Amazon Personalize *Items dataset*. 

Depending on your domain use case or custom recipe, item metadata can help Amazon Personalize recommend more relevant items to users, more accurately predict similar items, or recommend more meaningful user segments. And it can help Amazon Personalize feature new items in recommendations. Item metadata is required for some domain use cases and optional for all custom recipes. For more information, see the data requirements for your domain use case or recipe in [Matching your use case to Amazon Personalize resources](use-cases-and-recipes.md).

 When training, Amazon Personalize doesn't use non-categorical string item data, such as item titles or author data. However, importing this data can still enhance recommendations. For more information, see [Non-categorical string data](#item-string-data). 

The maximum number items Amazon Personalize considers during training depends on your use case or recipe. Only items considered during training can appear in recommendations.
+ For User-Personalization-v2 or Personalized-Ranking-v2, the maximum number of items that are considered by a model during training is 5 million. These items are from both the Items and Item interactions dataset.
+ For all domain use cases and custom recipes other than User-Personalization-v2 and Personalized-Ranking-v2, the maximum number of items that are considered by a model during training and generating recommendations is 750,000.

For all domain use cases and custom recipes, your bulk item data must be in a CSV file. Each row in the file should represent a unique item. After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md).

The following sections provide more information on how to prepare your item metadata for Amazon Personalize. For bulk data format guidelines for all types of data, see [bulk data format guidelines](preparing-training-data.md#general-formatting-guidelines)

**Topics**
+ [

## Item data requirements
](#item-data-requirements)
+ [

## Creation timestamp data
](#creation-timestamp-data)
+ [

## Categorical metadata
](#item-categorical-data)
+ [

## Unstructured text metadata
](#text-data)
+ [

## Numerical data
](#item-numerical-data)
+ [

## Non-categorical string data
](#item-string-data)
+ [

## Items metadata example
](#items-data-example)

## Item data requirements


 The following are item metadata requirements for Amazon Personalize.

If you aren't sure you have enough data or if you have questions about its quality, you can import your data into an Amazon Personalize dataset and use Amazon Personalize to analyze it. For more information, see [Analyzing quality and quantity of data in Amazon Personalize datasets](analyzing-data.md).
+ For all domain use cases and custom recipes, you must have an ITEM\$1ID column that stores the unique identifier for each item. Every item must have an item ID. It must be a `string` with a max length of 256 characters.
+ For custom recipes, your data must have at least one categorical string or numerical metadata column. Item metadata columns can include empty/null values. We recommend that these columns be at minimum 70 percent complete.
+ For domain use cases, the required columns depend on your domain. For more information, see [VIDEO\$1ON\$1DEMAND domain requirements](#vod-item-data-req) or [ECOMMERCE domain requirements](#retail-item-data-req). 
+ The maximum number of metadata columns is 100.

### VIDEO\$1ON\$1DEMAND domain requirements


An item metadata is required for some use cases (see [VIDEO\$1ON\$1DEMAND use cases](VIDEO_ON_DEMAND-use-cases.md)). When optional, we still recommend importing item metadata to get the most relevant recommendations. If you import item metadata, your data must include the following columns:
+ ITEM\$1ID
+ GENRES (categorical `string`)
+ CREATION\$1TIMESTAMP (in Unix epoch time format)

 The following lists additional recommended columns and their required types. The `null` type indicates that the column can have missing values. We recommend that these columns be at minimum 70 percent complete. Including these columns can improve recommendations.
+ PRICE (float)
+ DURATION (float)
+ GENRE\$1L2 (categorical `string`, `null`)
+ GENRE\$1L3 (categorical `string`, `null`)
+ AVERAGE\$1RATING (`float`, `null`)
+ PRODUCT\$1DESCRIPTION (textual `string`, `null`)
+ CONTENT\$1OWNER (categorical `string`, `null`) – The company that owns the video. For example, values might be HBO, Paramount, and NBC.
+ CONTENT\$1CLASSIFICATION (categorical `string`, `null`) – The content's rating. For example, values might be G, PG, PG-13, R, NC-17, and unrated.

### ECOMMERCE domain requirements


 Item metadata is optional for all ECOMMERCE use cases. If you have item data, we recommend importing it to get the most relevant recommendations. If you import item metadata, your data must have the following columns:
+ ITEM\$1ID
+ PRICE (`float`)
+ CATEGORY\$1L1 (categorical `string`) – For information about formatting categorical data, see [Categorical metadata](#item-categorical-data).

 The following lists additional recommended columns and their required types. The `null` type indicates that the column can have missing values. We recommend that these columns be at minimum 70 percent complete. Including these columns can improve recommendations.
+ CATEGORY\$1L2 (categorical `string`, `null`)
+ CATEGORY\$1L3 (categorical `string`, `null`)
+ PRODUCT\$1DESCRIPTION (textual `string`, `null`)
+ CREATION\$1TIMESTAMP (`float`)
+ AGE\$1GROUP (categorical `string`, `null`) – The age group the item is for. Values might be newborns, infants, children, and adults.
+ ADULT (categorical `string`, `null`) – Whether the item is restricted to only adults, such as alcohol. Values might be yes or no.
+ GENDER (categorical `string`, `null`) – The gender the item is for. Values might be male, female, and unisex.

## Creation timestamp data


Creation timestamp data must be in Unix epoch time format in seconds. For example, the Epoch timestamp in seconds for date July 31, 2020 is 1596238243. To convert dates to Unix epoch timestamps, use an [Epoch converter - Unix timestamp converter](https://www.epochconverter.com). 

Amazon Personalize uses creation timestamp data (in Unix epoch time format, in seconds) to calculate the age of an item and adjust recommendations accordingly.

If creation timestamp data is missing for one or more items, Amazon Personalize infers this information from interaction data, if any, and uses the timestamp of the item’s oldest interaction data as the item's creation timestamp. If an item has no interaction data, its creation timestamp is set as the timestamp of the latest interaction in the training set and Amazon Personalize considers it a new item. 

## Categorical metadata


 With certain recipes and all domain use cases, Amazon Personalize uses categorical metadata, such as an item's genre or color, when identifying underlying patterns that reveal the most relevant items for your users. You define your own range of values based on your use case. Categorical metadata can be in any language. 

 For items with multiple categories, separate each value with the vertical bar, '\$1'. For example, for a GENRES field, your data for an item might be `Action|Crime|Biopic`. If you have a multiple levels of categorical data and some items have multiple categories for each level in the hierarchy, use a separate column for each level and append a level indicator after each field name: GENRES, GENRE\$1L2, GENRE\$1L3. This allows you to filter recommendations based on sub-categories, even if an item belongs to multiple multi-level categories (for information on creating and using filters see [Filtering recommendations and user segments](filter.md)). For example, a video might have the following data for each category level: 
+ GENRES: Action\$1Adventure
+ GENRE\$1L2: Crime\$1Western
+ GENRE\$1L3: Biopic

In this example, the video is in the action > crime > biopic hierarchy *and* the adventure > western > biopic hierarchy. We recommend only using up to L3 but you can use more levels if necessary.

Categorical values can have a maximum of 1000 characters. If you have an item with a categorical value with more than 1000 characters, your dataset import job will fail. We recommend categorical columns have at most 1000 possible values. Importing categorical data with more values can negatively impact recommendations. The following can help you reduce the number of possible values for a categorical column:
+ Make sure values follow a consistent naming convention and check for typos. For example, use "Men's Shoes" rather than having a mix of "Men's Shoes", "Mens Shoes", and "Male Footwear".
+ Consolidate similar categories that use slightly different terms referring to the same underlying category, like "Shoes" and "Sneakers".
+ If your data has a hierarchical structure, where broader categories (like "Footwear") contain more specific subcategories (such as "Men's Shoes", "Women's Shoes", "Children's Shoes"), use a separate column for each level and append a level indicator after each field name. For example, CATEGORY\$11, CATEGORY\$12, and CATEGORY\$13. This can reduce ambiguous or overlapping categories. 

With all recipes and domains, you can import categorical data and use it to filter recommendations based on an item's attributes. For information about filtering recommendations, see [Filtering recommendations and user segments](filter.md). 

## Unstructured text metadata


With certain recipes and domains, Amazon Personalize can extract meaningful information from unstructured text metadata, such as product descriptions, product reviews, or movie synopses. Amazon Personalize uses unstructured text to identify relevant items for your users, particularly when items are new or have less interactions data. You can add at most 1 textual field. Include unstructured text data in your Items dataset to increase click-through rates and conversation rates for new items in your catalog. 

When you prepare your unstructured text metadata, wrap the text in double quotes and remove any new line characters. Use the `\` character to escape any double quotes or \$1 characters in your data. Amazon Personalize truncates text fields at the character limit. Make sure that the most relevant information in the text is at the start of the field.

Unstructured text values can have at most 20,000 characters in all languages except Chinese and Japanese. For Chinese and Japanese, you can have at most 7,000 characters. Amazon Personalize truncates values that exceed the character limit to the character limit. 

You can submit unstructured text items in multiple languages, but each item's text should be in only one language. Text can be in the following languages: 
+ Chinese (Simplified)
+ Chinese (Traditional)
+ English
+ French
+ German
+ Japanese
+ Portuguese
+ Spanish

## Numerical data


 Amazon Personalize can use numerical item metadata, such as price or video duration, to generate more relevant recommendations for users. This numerical data can be represented as whole numbers or decimal values.

If you use the [User-Personalization](native-recipe-new-item-USER_PERSONALIZATION.md) or [Personalized-Ranking](native-recipe-search.md) custom recipes, you can optimize an Amazon Personalize solution for an Item metadata related objective in addition to maximum relevance, such as maximizing revenue. When you configure your solution, you choose the numerical metadata column in your Items dataset that is related to your objective. For example, you might choose a VIDEO\$1LENGTH column to maximize streaming minutes or a PRICE column to maximize revenue. 

For more information, see [Optimizing a solution for an additional objective](optimizing-solution-for-objective.md).

## Non-categorical string data


 Except for item IDs, Amazon Personalize doesn't use non-categorical non-textual string data when training, such as item titles or author data. However, Amazon Personalize can use it with the following features. Non-categorical values can have a maximum of 1000 characters. 
+ Amazon Personalize can include item metadata in recommendations, including non-categorical string values. You might use metadata to enrich recommendations in your user interface, such as adding the director's name to a movie recommendations carousel. For more information, see [Item metadata in recommendations](campaigns.md#create-campaign-return-metadata).
+  If you use [Similar-Items](native-recipe-similar-items.md), you can generate batch recommendations with themes. When you generate batch recommendations with themes, you must specify an item name column in the batch inference job. For more information, see [Batch recommendations with themes from Content Generator](themed-batch-recommendations.md). 
+  You can create filters to include or remove items from recommendations based on non-categorical string data. For more information about filters, see [Filtering recommendations and user segments](filter.md). 

## Items metadata example


The first few lines of movie metadata in a CSV file might look like the following.

```
ITEM_ID,GENRES,CREATION_TIMESTAMP,DESCRIPTION
1,Adventure|Animation|Children|Comedy|Fantasy,1570003267,"This is an animated movie that features action, comedy, and fantasy. Audience is children. This movie was released in 2004."
2,Adventure|Children|Fantasy,1571730101,"This is an adventure movie with elements of fantasy. Audience is children. This movie was release in 2010."
3,Comedy|Romance,1560515629,"This is a romantic comedy. The movie was released in 1999. Audience is young women."
4,Comedy|Drama|Romance,1581670067,"This movie includes elements of both comedy and drama as well as romance. This movie was released in 2020."
...
...
```

The `ITEM_ID` column is required and stores unique identifiers for each individual item. The `GENRE` column stores categorical metadata for each movie and the `DESCRIPTION` column is unstructured textual metadata. The `CREATION_TIMESTAMP` column stores each items creation time in Unix epoch time format in seconds.

After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md). This is what the schema JSON file would look like for the above sample data.

```
{
  "type": "record",
  "name": "Items",
  "namespace": "com.amazonaws.personalize.schema",
  "fields": [
    {
      "name": "ITEM_ID",
      "type": "string"
    },
    {
      "name": "GENRES",
      "type": [
        "null",
        "string"
      ],
      "categorical": true
    },
    {
      "name": "CREATION_TIMESTAMP",
      "type": "long"
    },
    {
      "name": "DESCRIPTION",
      "type": [
        "null",
        "string"
      ],
      "textual": true
    }
  ],
  "version": "1.0"
}
```

# Preparing user metadata for training
User metadata

 The user data that you can import into Amazon Personalize includes numerical data, such as user age, and categorical metadata, such as gender or loyalty membership. You import metadata about your users into an Amazon Personalize *Users dataset*. 

Depending on your domain use case or custom recipe, user metadata can help Amazon Personalize recommend more relevant items to users or recommend more meaningful user segments. And after training, it can help your model recommend items for users without any interactions data. For more information about what use cases or recipes use user metadata, see the data requirements for your domain use case or recipe in [Matching your use case to Amazon Personalize resources](use-cases-and-recipes.md).

 When training, Amazon Personalize doesn't use non-categorical string user data, such as user's names, keywords about the user, or tags. However, importing this data can still enhance recommendations. For more information, see [Non-categorical string data](#user-string-data). 

For all domain use cases and custom recipes, your bulk user data must be in a CSV file. Each row in the file should represent a unique user. After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md).

The following sections provide more information on how to prepare your user data for Amazon Personalize. For bulk data format guidelines for all types of data, see [bulk data format guidelines](preparing-training-data.md#general-formatting-guidelines)

**Topics**
+ [

## User data requirements
](#user-data-requirements)
+ [

## Categorical metadata
](#user-categorical-data)
+ [

## Non-categorical string data
](#user-string-data)
+ [

## Users metadata example
](#users-data-example)

## User data requirements


 The following are user data requirements for Amazon Personalize. You are free to add additional custom columns depending on your use case and your data.
+ Your data must have an USER\$1ID column that stores the unique identifier for each user. Every user must have an user ID. It must be a `string` with a max length of 256 characters.
+ Your data must have least one categorical string or numerical metadata column. User metadata columns can include empty/null values for some users. We recommend that these columns be at minimum 70 percent complete.
+ The maximum number of metadata columns is 25.

If you aren't sure you have enough data or if you have questions about its quality, you can import your data into an Amazon Personalize dataset and use Amazon Personalize to analyze it. For more information, see [Analyzing quality and quantity of data in Amazon Personalize datasets](analyzing-data.md).

## Categorical metadata


With some recipes and all domain use cases, Amazon Personalize uses categorical metadata, such as a user's gender, interests, or membership status, when identifying underlying patterns that reveal the most relevant items for your users. You define your own range of values based on your use case. Categorical metadata can be in any language. 

For users with multiple categories, separate each value with the vertical bar, '\$1'. For example, for an INTERESTS field, your data for a user might be `Movies|TV Shows|Music`.

With all recipes and domains, you can import categorical metadata and use it to filter recommendations based on a user's attributes. For information about filtering recommendations see [Filtering recommendations and user segments](filter.md). 

Categorical values can have at most 1000 characters. If you have a user with a categorical value with more than 1000 characters, your dataset import job will fail.

## Non-categorical string data


 Except for user IDs, Amazon Personalize doesn't use non-categorical string data when training, such as user's names, keywords about the user, or tags. However, Amazon Personalize can use it when filtering recommendations. You can create filters to include or remove items from recommendations based on non-categorical string data about the user you are getting recommendations for (the CurrentUser). For more information about filters, see [Filtering recommendations and user segments](filter.md). Non-categorical values can have a maximum of 1000 characters. 

## Users metadata example


The first few lines of user metadata in a CSV file might look like the following.

```
USER_ID,AGE,GENDER,INTEREST
5,34,Male,hiking
6,56,Female,music
8,65,Male,movies|TV shows|music
...
...
```

The `USER_ID` column is required and stores unique identifiers for each individual user. The `AGE` column is numerical metadata. The `GENDER` and `INTEREST` columns store categorical metadata for each user. 

After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md). This is what the schema JSON file would look like for the above sample data.

```
{
  "type": "record",
  "name": "Users",
  "namespace": "com.amazonaws.personalize.schema",
  "fields": [
      {
          "name": "USER_ID",
          "type": "string"
      },
      {
          "name": "AGE",
          "type": "int"
      },
      {
          "name": "GENDER",
          "type": "string",
          "categorical": true
      },
      {
          "name": "INTEREST",
          "type": "string",
          "categorical": true
      }
  ],
  "version": "1.0"
}
```

# Preparing action metadata for training
Action metadata

 An *action* is an engagement or revenue generating activity that you might want to recommend to your users. Actions might include installing your mobile app, completing a membership profile, joining your loyalty program, or signing up for promotional emails. You import data about your actions into an Amazon Personalize *Actions dataset*. Examples of data for an action include a unique ID for the action, the action's estimated value, or the action's expiration timestamp.

If you use [Next-Best-Action](native-recipe-next-best-action.md), you must import action metadata. With this recipe, Amazon Personalize predicts the next best action from the actions you import into your Actions dataset. No other recipes or use cases use action metadata. You can't create an Actions dataset in a domain dataset group. 

 When training, Amazon Personalize doesn't use non-categorical string action data, such as action titles or tags. However, importing this data can still enhance recommendations. For more information, see [Non-categorical string data](#action-string-data). 

Your bulk action data must be in a CSV file. Each row in the file should represent a unique action. After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md).

The following sections provide more information on how to prepare your action metadata for Amazon Personalize. For bulk data format guidelines for all types of data, see [bulk data format guidelines](preparing-training-data.md#general-formatting-guidelines)

**Topics**
+ [

## Action data requirements
](#action-data-requirements)
+ [

## Action expiration timestamp data
](#action-expiration-timestamp-data)
+ [

## Repeat frequency data
](#action-repeat-frequency)
+ [

## Value data
](#action-value-data)
+ [

## Creation timestamp data
](#action-creation-timestamp-data)
+ [

## Categorical metadata
](#action-categorical-data)
+ [

## Non-categorical string data
](#action-string-data)
+ [

## Actions metadata example
](#actions-data-example)

## Action data requirements


 The following are action data requirements for Amazon Personalize.
+ You must have an ACTION\$1ID column that stores the unique identifier for each action. Every action must have an item ID. It must be a `string` with a max length of 256 characters.
+ Your data must have at least one categorical string or numerical metadata column. Action metadata columns can include empty/null values. We recommend that these columns be at minimum 70 percent complete.
+ During model training, Amazon Personalize considers a maximum of 1000 actions. If you import more than 1000 actions, Amazon Personalize decides which actions to include in training, with priority given to new actions (actions you recently added with no interactions) and existing actions with recent interactions data.
+ The maximum number of columns is 10.

## Action expiration timestamp data


 An action expiration timestamp specifies the date at which an action is no longer valid. You provide action expiration timestamp data in Unix epoch time format, in seconds. If an action has expired, Amazon Personalize won't include it in recommendations. 

 Specify an action expiration timestamp for your actions if you want to limit their appearance in recommendations to a certain time frame. For example, you might have an application that is running a membership drive through a certain month. You might set an expiration timestamp for the *enroll* action for the end of that month. Amazon Personalize automatically stops recommending this action when this date is reached. 

 If you set the expiration timestamp to a time in the past for a new action, or if you update an actions timestamp to a time in the past, it can take up to 2 hours to remove the action from recommendations. 

## Repeat frequency data


 Repeat frequency data specifies how many days Amazon Personalize should wait to recommend a particular action after a user interacts with it, based on the user's history in your Action interactions dataset. You specify an action's repeat frequency in days, with a maximum of 30. 

For example, you might have an ecommerce application where each user creates an account and a profile. If you have a `complete profile` action and you want to wait a week after a user interacts with it before recommending it again, you would specify 7 days as the action's `REPEAT_FREQENCY`. After 7 days, Amazon Personalize starts considering the action for recommendations. 

 If you don't provide a repeat frequency for an action, Amazon Personalize will not set any limits on the number of times it appears in recommendations. 

## Value data


 Value data is the business value or importance of each action. An action's `value` can be 1 – 10, where 10 is the most valuable action in your dataset.

 For example, you might have two actions, one for enrolling in your basic subscription and one for enrolling in your premium service. For the basic service, you might specify a value of `5` and for the premium, a value of `10`.

 Amazon Personalize uses value data as one input when determining the best action to recommend to your users. For example, if a user is equally likely to take one action or another, Amazon Personalize ranks the action with the highest value higher in recommendations. 

## Creation timestamp data


Amazon Personalize uses creation timestamp data (in Unix epoch time format, in seconds) to calculate the age of an action and adjust recommendations accordingly.

If you don't have creation timestamp data, Amazon Personalize infers this information from any action interaction data. It uses the timestamp of the action’s oldest interaction data as the action's creation timestamp. If an action has no interaction data, its creation timestamp is set as the timestamp of the latest interaction in the training set, and Amazon Personalize considers it a new action. 

## Categorical metadata


 Amazon Personalize uses categorical metadata about actions, such as seasonality or action exclusivity, when identifying the underlying patterns that reveal the best actions for your users. You define your own range of values based on your use case. Categorical metadata can be in any language. 

 You can import categorical data and use it to filter recommendations based on an action's attributes. For information about filtering recommendations, see [Filtering recommendations and user segments](filter.md). 

Categorical values can have a maximum of 1000 characters. If you have an action with a categorical value with more than 1000 characters, your dataset import job will fail. 

## Non-categorical string data


 Except for action IDs, Amazon Personalize doesn't use non-categorical string data when training, such as an action's name, keywords about the action, or tags. However, Amazon Personalize can use it when filtering recommendations. You can create filters to include or remove actions from recommendations based on non-categorical string data. For more information about filters, see [Filtering recommendations and user segments](filter.md). Non-categorical values can have a maximum of 1000 characters. 

## Actions metadata example


The first few lines of action metadata in a CSV file might look like the following.

```
ACTION_ID,VALUE,MEMBERSHIP_LEVEL,CREATION_TIMESTAMP,REPEAT_FREQUENCY
1,10,Deluxe|Premium,1510003267,7
2,5,Basic,1580003267,7
3,5,Preview,1590003267,3
4,10,Deluxe|Platinum,1560003267,4
...
...
```

The `ACTION_ID` column is required. The `MEMBERSHIP_LEVEL` column is a categorical string field. The `VALUE`, `CREATION_TIMESTAMP`, and `REPEAT_FREQUENCY` fields are reserved keywords with the required types.

 After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md). This is what the schema JSON file would look like for the above sample data.

```
{
  "type": "record",
  "name": "Actions",
  "namespace": "com.amazonaws.personalize.schema",
  "fields": [
    {
      "name": "ACTION_ID",
      "type": "string"
    },
    {
      "name": "VALUE",
      "type": [
        "null",
        "long"
      ]
    },
    
    {
      "name": "MEMBERSHIP_LEVEL",
      "type": [
        "null",
        "string"
      ],
      "categorical": true
    },
    
    {
      "name": "CREATION_TIMESTAMP",
      "type": "long"
    },
    {
      "name": "REPEAT_FREQUENCY",
      "type": [
        "long",
        "null"
      ]
    }
  ],
  "version": "1.0"
}
```

# Preparing action interaction data for training
Action interaction data

 If you use the [Next-Best-Action](native-recipe-next-best-action.md) custom recipe, Amazon Personalize uses action interactions data to identify user interest and predict the actions they will most likely take. An *action interaction* is an interaction involving a user and an action in your [Actions dataset](actions-datasets.md). For example, if you have an *enroll* action in your Actions dataset, and a user takes this action, you would record the user's ID, the action's ID, the timestamp, and for event type, record `TAKEN`. 

You import action interactions into an Amazon Personalize *Action interactions dataset*. You can import action interaction events in bulk with a dataset import job, or you can stream them in real time with the [PutActionInteractions](API_UBS_PutActionInteractions.md) API operation. You can't create next best action resources, including Actions and Action Interactions datasets, in a domain dataset group.

Your bulk action interactions data must be in a CSV file. Each row in the file should represent a unique interaction between a user and an action. After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md).

 The following sections provide more information on how to prepare your action interaction data for Amazon Personalize. For bulk data format guidelines for all types of data, see [bulk data format guidelines](preparing-training-data.md#general-formatting-guidelines).

**Topics**
+ [

## Action interaction data requirements
](#action-interaction-requirements)
+ [

## Event type data
](#action-interaction-event-type-data)
+ [

## Action interactions data example
](#action-interactions-data-schema-example)

## Action interaction data requirements


There is no minimum requirement for action interactions data. We recommend that you import it for quality action recommendations. If you don't have action interaction data, you can create an empty Action interactions dataset and record your customers' interactions with actions by using the [PutActionInteractions](API_UBS_PutActionInteractions.md) API operation. 

Your action interactions data must have at minimum the following columns. You are free to add additional custom columns depending on your use case and your data.
+ USER\$1ID – The unique identifier of the user who interacted with the item. Every event must have an USER\$1ID. It must be a `string` with a max length of 256 characters.
+ ACTION\$1ID – The unique identifier of the item that the user interacted with. Every event must have an item ID. It must be a `string` with a max length of 256 characters.
+  TIMESTAMP – The time the event occurred (in Unix epoch time format in seconds). Every action interaction must have an TIMESTAMP. For more information, see [Timestamp data](interactions-datasets.md#timestamp-data).
+ EVENT\$1TYPE – Whether the action was Taken, Not taken, or Viewed. Every action interaction must have an event type. For more information, see [Event type data](#action-interaction-event-type-data).

 Until you import action interaction data, Amazon Personalize recommends actions in your without personalization, and propensity scores are 0.0. An action will have a score after the action has the following: 
+  At least 50 action interactions with the TAKEN event type. 
+  At least 50 action interactions with the NOT\$1TAKEN or VIEWED event type. 

These action interactions must be present at the latest solution version training, and must occur within a span of 6 weeks from the latest interaction timestamp in the Action interactions dataset. 

## Event type data


 Amazon Personalize can use patterns in event type data to identify the actions your users will most likely take. For example, if a customer frequently ignores an email subscription action (indicated with the NOT\$1TAKEN event type), Amazon Personalize might adjust recommendations to feature fewer of this type of action. 

 You can use only the following event types for action interaction events. Amazon Personalize uses these events to learn about your user and calculate what actions to recommend next.
+ Taken – Record *Taken* events when a user takes a recommended action.
+ Not Taken – Record *Not Taken* events when your user makes a deliberate choice to not take the action after viewing it. For example, if they choose *No* when you show them the action. *Not Taken* events can indicate the customer isn’t interested in the action.
+ Viewed – Record *Viewed* events when you show a user an action before they make a choice to take or not take an action. Amazon Personalize uses *View* events to learn about your users' interests. For example, if a user views an action but doesn't take it, this user might not be interested in this action in the future. 

## Action interactions data example


The first few lines of a CSV file with action interaction data and all required columns might look like the following.

```
USER_ID,ACTION_ID,EVENT_TYPE,TIMESTAMP
35,73,Viewed,1586731606
54,35,Not taken,1586731609
9,33,Viewed,1586735158
23,10,Taken,1586735697
27,11,Taken,1586735763
...
...
```

After you finish preparing your data, you are ready to create a schema JSON file. This file tells Amazon Personalize about the structure of your data. For more information, see [Creating schema JSON files for Amazon Personalize schemas](how-it-works-dataset-schema.md). This is what the schema JSON file would look like for the above sample data.

```
{

  "type": "record",
  "name": "ActionInteractions",
  "namespace": "com.amazonaws.personalize.schema",
  "fields": [
      {
          "name": "USER_ID",
          "type": "string"
      },
      {
          "name": "ACTION_ID",
          "type": "string"
      },
      {
          "name": "EVENT_TYPE",
          "type": "string"
      },
      {
          "name": "TIMESTAMP",
          "type": "long"
      }
  ],
  "version": "1.0"
}
```