# Getting batch user segments with custom resources
<a name="getting-user-segments"></a>

 To get *user segments*, you use a batch segment job. A *batch segment job* is a tool that imports your batch input data from an Amazon S3 bucket and uses your solution version trained with a USER\$1SEGMENTATION recipe to generate * user segments* for each row of input data.

Depending on the recipe, the input data is a list of items or item metadata attributes in JSON format. For item attributes, your input data can include expressions to create user segments based on multiple metadata attributes. A batch segment job exports user segments to an output Amazon S3 bucket. Each user segment is sorted in descending order based on the probability that each user will interact with the item in your input data. 

When generating user segments, Amazon Personalize considers data in datasets from bulk and individual imports:
+ For bulk data, Amazon Personalize generates segments using only the bulk data present at the last full solution version training. And it uses only bulk data that you imported with an import mode of FULL (replacing existing data).
+ For data from individual data import operations, Amazon Personalize generates user segments using the data present at the last full solution version training. To have newer records impact user segments, create a new solution version and then create a batch segment job.

Generating user segments works as follows:

1.  Prepare and upload your input data in JSON format to an Amazon S3 bucket. The format of your input data depends on the recipe you use and the job you are creating. See [Preparing input data for user segments](prepare-input-data-user-segment.md). 

1.  Create a separate location for your output data, either a different folder or a different Amazon S3 bucket. 

1.  Create a batch segment job. See [Getting user segments with a batch segment job](creating-batch-seg-job.md). 

1.  When the batch segment job is complete, retrieve the user segments from your output location in Amazon S3. 

**Topics**
+ [Guidelines and requirements for getting user segments](#batch-seg-permissions-req)
+ [Preparing input data for user segments](prepare-input-data-user-segment.md)
+ [Getting user segments with a batch segment job](creating-batch-seg-job.md)
+ [Batch segment job output format examples](batch-segment-job-output-examples.md)

## Guidelines and requirements for getting user segments
<a name="batch-seg-permissions-req"></a>

The following are guidelines and requirements for batch getting batch segments:
+  You must use a USER\$1SEGMENTATION recipe. 
+ Your Amazon Personalize IAM service role needs permission to read and add files to your Amazon S3 buckets. For information on granting permissions, see [Service role policy for batch workflows](granting-personalize-s3-access.md#role-policy-for-batch-workflows). For more information on bucket permissions, see [User policy examples](https://docs.aws.amazon.com/AmazonS3/latest/userguide/example-policies-s3.html) in the *Amazon Simple Storage Service Developer Guide*. 

   If you use AWS Key Management Service (AWS KMS) for encryption, you must grant Amazon Personalize and your Amazon Personalize IAM service role permission to use your key. For more information, see [Giving Amazon Personalize permission to use your AWS KMS key](granting-personalize-key-access.md).
+  You must create a custom solution and solution version before you create a batch inference job. However, you don't need to create an Amazon Personalize campaign. If you created a Domain dataset group, you can still create custom resources. 
+  Your input data must be formatted as described in [Preparing input data for user segments](prepare-input-data-user-segment.md). 
+  If you use the Item-Attribute-Affinity recipe, the attributes in your input data can't include unstructured textual item metadata, such as a product description. 
+ If you use a filter with placeholder parameters, you must include the values for the parameters in your input data in a `filterValues` object. For more information, see [Providing filter values in your input JSON](filter-batch.md#providing-filter-values). 
+ We recommend that you use a different location for your output data (either a folder or a different Amazon S3 bucket) than your input data. 

# Preparing input data for user segments
<a name="prepare-input-data-user-segment"></a>

Batch segment jobs use a solution version to make user segments based on data that you provide in an input JSON file. Before you can get user segments, you must prepare and upload your JSON file to an Amazon S3 bucket. We recommend that you create an output folder in your Amazon S3 bucket or use a separate output Amazon S3 bucket. You can then run multiple batch inference jobs using the same input data location. 

 If you use a filter with placeholder parameters, such as `$GENRE`, you must provide the values for the parameters in a `filterValues` object in your input JSON. For more information, see [Providing filter values in your input JSON](filter-batch.md#providing-filter-values). 

**To prepare and import data**

1. Format your batch input data depending on the recipe your solution uses. Separate input data element with a new line. Your input data is either a list of itemIds (Item-Affinity) or item attributes (Item-Attribute-Affinity).
   + For item attributes, input data can include logical expressions with the `AND` operator to get users for multiple items or attributes per query. For more information, see [Specifying item attributes for the Item-Attribute-Affinity recipe](#specifying-item-attributes). 
   +  For item attributes, use the `\` character to escape any special characters and single or double quotes in your input data. 
   + For input data examples for both recipes, see [Batch segment job input and output JSON examples](#batch-segment-job-json-examples).

1.  Upload your input JSON to an input folder in your Amazon S3 bucket. For more information, see [Uploading files and folders by using drag and drop](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/upload-objects.html) in the *Amazon Simple Storage Service User Guide* 

1.  Create a separate location for your output data, either a folder or a different Amazon S3 bucket. By creating a separate location for the output JSON, you can run multiple batch segment jobs with the same input data location.

 After you have prepared your input data and uploaded it to an Amazon S3 bucket, you are ready to generate user segments with a batch segment job. For more information, see [Getting user segments with a batch segment job](creating-batch-seg-job.md). 

**Topics**
+ [Specifying item attributes for the Item-Attribute-Affinity recipe](#specifying-item-attributes)
+ [Batch segment job input and output JSON examples](#batch-segment-job-json-examples)

## Specifying item attributes for the Item-Attribute-Affinity recipe
<a name="specifying-item-attributes"></a>

If you use the Item-Attribute-Affinity recipe, your input data is a list of item attributes. You can mix different columns of metadata. For example one row might be a numerical column and the next might be a categorical column. You can't use unstructured textual item metadata as an item attribute. 

Your input item metadata can include logical expressions with the `AND` operator to get a user segment for multiple attributes. For example, a line of your input data might be `{"itemAttributes": "ITEMS.genres = \"Comedy\" AND ITEMS.genres = \"Action\""}` or `{"itemAttributes": "ITEMS.genres = "\Comedy\" AND ITEMS.audience = "\teen\""}`.

When you combine two attributes with the `AND` operator, you create a user segment with users who are more likely to interact with items that have both attributes based on the users interactions history. Unlike filter expressions (which use the `IN` operator for string equality), batch segment input expressions support only the `=` symbol for equality for string matching. 

## Batch segment job input and output JSON examples
<a name="batch-segment-job-json-examples"></a>

For a batch segment job, your input data must be either a list of itemIds (Item-Affinity recipe) or item attributes (Item-Attribute-Affinity). Each line of input data is a separate inference query. Each user segment is sorted in descending order based on the probability that each user will interact with items in your inventory.

 If you use a filter with placeholder parameters, such as `$GENRE`, you must provide the values for the parameters in a `filterValues` object in your input JSON. For more information, see [Providing filter values in your input JSON](filter-batch.md#providing-filter-values). 

The following are correctly formatted JSON input and output examples for batch segment jobs organized by recipe.

**Item-Affinity**

------
#### [ Input ]

Your input data can have a maximum of 500 items. Separate each `itemId` with a new line as follows.

```
{"itemId": "105"}
{"itemId": "106"}
{"itemId": "441"}
...
```

------
#### [ Output ]

```
{"input": {"itemId": "105"}, "output": {"recommendedUsers": ["106", "107", "49"]}}
{"input": {"itemId": "106"}, "output": {"recommendedUsers": ["105", "107", "49"]}}
{"input": {"itemId": "441"}, "output": {"recommendedUsers": ["2", "442", "435"]}}
...
```

------

**Item-Attribute-Affinity**

------
#### [ Input ]

Your input data can have a maximum of 10 queries, where each query is one or more non-textual item attributes. Separate each attribute or attribute expression with a new line as follows.

```
{"itemAttributes": "ITEMS.genres = \"Comedy\" AND ITEMS.genres = \"Action\""}
{"itemAttributes": "ITEMS.genres = \"Comedy\""}
{"itemAttributes": "ITEMS.genres = \"Horror\" AND ITEMS.genres = \"Action\""}
...
```

------
#### [ Output ]

```
{"itemAttributes": "ITEMS.genres = \"Comedy\" AND ITEMS.genres = \"Action\"", "output": {"recommendedUsers": ["25", "78", "108"]}}
{"itemAttributes": "ITEMS.genres = \"Adventure\"", "output": {"recommendedUsers": ["87", "31", "129"]}}
{"itemAttributes": "ITEMS.genres = \"Horror\" AND ITEMS.genres = \"Action\"", "output": {"recommendedUsers": ["8", "442", "435"]}}
...
```

------

# Getting user segments with a batch segment job
<a name="creating-batch-seg-job"></a>

 If you used a USER\$1SEGMENTATION recipe, you can create batch segment jobs to get user segments with your solution version. Each user segment is sorted in descending order based on the probability that each user will interact with items in your inventory. Depending on the recipe, your input data must be a list of items ([Item-Affinity recipe](item-affinity-recipe.md)) or item attributes ([Item-Attribute-Affinity recipe](item-attribute-affinity-recipe.md)) in JSON format. You can create a batch segment job with the Amazon Personalize console, the AWS Command Line Interface (AWS CLI), or AWS SDKs. 

 When you create a batch segment job, you specify the Amazon S3 paths to your input and output locations. Amazon S3 is prefix based. If you provide a prefix for the input data location, Amazon Personalize uses all files matching that prefix as input data. For example, if you provide `s3://amzn-s3-demo-bucket/folderName` and your bucket also has a folder with a path of `s3://amzn-s3-demo-bucket/folderName_test`, Amazon Personalize uses all files in both folders as input data. To use only the files within a specific folder as input data, end the Amazon S3 path with a prefix delimiter, such as `/`: `s3://amzn-s3-demo-bucket/folderName/` For more information about how Amazon S3 organizes objects, see [Organizing, listing, and working with your objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/organizing-objects.html). 

**Topics**
+ [Creating a batch segment job (console)](#batch-segment-console)
+ [Creating a batch segment job (AWS CLI)](#batch-segment-cli)
+ [Creating a batch segment job (AWS SDKs)](#batch-segment-sdk)

## Creating a batch segment job (console)
<a name="batch-segment-console"></a>

 After you have completed [Preparing input data for batch recommendations](batch-data-upload.md), you are ready to create a batch segment job. This procedure assumes that you have already created a solution and a solution version (trained model) with a USER\$1SEGEMENTATION recipe.

**To get create a batch segment job (console)**

1. Open the Amazon Personalize console at [https://console.aws.amazon.com/personalize/home](https://console.aws.amazon.com/personalize/home) and sign in to your account.

1. On the **Datasets group** page, choose your dataset group.

1. Choose **batch segment jobs** in the navigation pane, then choose **Create batch segment job**.

1. In **batch segment job details**, for **Batch segment job name**, specify a name for your batch segment job.

1. For **Solution**, choose the solution and then choose the **Solution version ID** that you want to use to generate the recommendations. You can create batch segment jobs only if you used a USER\$1SEGEMENTATION recipe. 

1. For **Number of users**, optionally specify the number of users Amazon Personalize generates for each user segment. The default is 25. The maximum is 5 million.

1.  For **Input source**, specify the Amazon S3 path to your input file or use the **Browse S3** to choose your Amazon S3 bucket.

   Use the following syntax: **s3://amzn-s3-demo-bucket/<folder name>/<input JSON file name>.json**

    Your input data must be in the correct format for the recipe your solution uses. For input data examples see [Batch segment job input and output JSON examples](prepare-input-data-user-segment.md#batch-segment-job-json-examples). 

1. For **Output destination**, specify the path to your output location or use the **Browse S3** to choose your Amazon S3 bucket. We recommend using a different location for your output data (either a folder or a different Amazon S3 bucket).

    Use the following syntax: **s3://amzn-s3-demo-bucket/<output folder name>/** 

1. For **IAM role**, choose one of the following:
   +  Choose **Create and use new service role** and enter the **Service role name** to create a new role, or
   +  If you've already created a role with the correct permissions, choose **Use an existing service role** and choose the IAM role. 

    The role you use must have read and write access to your input and output Amazon S3 buckets respectively.

1.  For **Filter configuration** optionally choose a filter to apply a filter to the user segments. If your filter uses placeholder parameters, make sure the values for the parameters are included in your input JSON. For more information, see [Providing filter values in your input JSON](filter-batch.md#providing-filter-values). 

1. For **Tags**, optionally add any tags. For more information about tagging Amazon Personalize resources, see [Tagging Amazon Personalize resources](tagging-resources.md).

1.  Choose **Create batch segment job**. Batch segment job creation starts and the **Batch segment jobs** page appears with the **Batch segment job detail** section displayed.

1.  When the batch segment job's status changes to **Active**, you can retrieve the job's output from the designated output Amazon S3 bucket. The output file's name will be of the format `input-name.out`. 

## Creating a batch segment job (AWS CLI)
<a name="batch-segment-cli"></a>

After you have completed [Preparing input data for batch recommendations](batch-data-upload.md), you are ready to create a batch segment job using the following `create-batch-segment-job` code. Specify a job name, replace `Solution version ARN` with the Amazon Resource Name (ARN) of your solution version, and replace the `IAM service role ARN` with the ARN of the IAM service role you created for Amazon Personalize during set up. This role must have read and write access to your input and output Amazon S3 buckets respectively. For `num-results` specify the number of users you want Amazon Personalize to predict for each line of input data. The default is 25. The maximum is 5 million. Optionally provide a `filter-arn` to filter user segments. If your filter uses placeholder parameters, make sure the values for the parameters are included in your input JSON. For more information, see [Filtering batch recommendations and user segments (custom resources)](filter-batch.md). 

Replace `S3 input path` and `S3 output path` with the Amazon S3 path to your input file and output locations. We recommend using a different location for your output data (either a folder or a different Amazon S3 bucket). Use the following syntax for input and output locations: **s3://amzn-s3-demo-bucket/<folder name>/<input JSON file name>.json** and **s3://amzn-s3-demo-bucket/<output folder name>/**. 

```
aws personalize create-batch-segment-job \
                --job-name Job name \
                --solution-version-arn Solution version ARN \
                --num-results The number of predicted users \
                --filter-arn Filter ARN \
                --job-input s3DataSource={path=s3://S3 input path} \
                --job-output s3DataDestination={path=s3://S3 output path} \
                --role-arn IAM service role ARN
{
   "batchSegmentJobArn": "arn:aws:personalize:us-west-2:acct-id:batch-segment-job/batchSegmentJobName"
}
```

## Creating a batch segment job (AWS SDKs)
<a name="batch-segment-sdk"></a>

After you have completed [Preparing input data for batch recommendations](batch-data-upload.md), you are ready to create a batch segment job with the `CreateBatchSegmentJob` operation. The following code shows how to create a batch segment job. Give the job a name, specify the Amazon Resource Name (ARN) of the solution version to use, specify the ARN for your Amazon Personalize IAM role, and specify the Amazon S3 path to your input file and output locations. Your IAM service role must have read and write access to your input and output Amazon S3 buckets respectively. 

We recommend using a different location for your output data (either a folder or a different Amazon S3 bucket). Use the following syntax for input and output locations: **s3://amzn-s3-demo-bucket/<folder name>/<input JSON file name>.json** and **s3://amzn-s3-demo-bucket/<output folder name>/**. 

 For `numResults`, specify the number of users you want Amazon Personalize to predict for each line of input data. The default is 25. The maximum is 5 million. Optionally provide a `filterArn` to filter user segments. If your filter uses placeholder parameters, make sure the values for the parameters are included in your input JSON. For more information, see [Filtering batch recommendations and user segments (custom resources)](filter-batch.md). 

------
#### [ SDK for Python (Boto3) ]

```
import boto3

personalize_rec = boto3.client(service_name='personalize')

personalize_rec.create_batch_segment_job (
    solutionVersionArn = "Solution version ARN",
    jobName = "Job name",
    numResults = 25,
    filterArn = "Filter ARN",
    roleArn = "IAM service role ARN",
    jobInput = 
       {"s3DataSource": {"path": "s3://amzn-s3-demo-bucket/<folder name>/<input JSON file name>.json"}},
    jobOutput = 
       {"s3DataDestination": {"path": "s3://amzn-s3-demo-bucket/<output folder name>/"}}
)
```

------
#### [ SDK for Java 2.x ]

```
public static String createBatchSegmentJob(PersonalizeClient personalizeClient,
                                                        String solutionVersionArn,
                                                        String jobName,
                                                        String filterArn,
                                                        int numResults,
                                                        String s3InputDataSourcePath,
                                                        String s3DataDestinationPath,
                                                        String roleArn,
                                                        String explorationWeight,
                                                        String explorationItemAgeCutOff) {

  long waitInMilliseconds = 60 * 1000;
  String status;
  String batchSegmentJobArn;

  try {
      // Set up data input and output parameters.
      S3DataConfig inputSource = S3DataConfig.builder()
              .path(s3InputDataSourcePath)
              .build();
      S3DataConfig outputDestination = S3DataConfig.builder()
              .path(s3DataDestinationPath)
              .build();

      BatchSegmentJobInput jobInput = BatchSegmentJobInput.builder()
              .s3DataSource(inputSource)
              .build();
      BatchSegmentJobOutput jobOutputLocation = BatchSegmentJobOutput.builder()
              .s3DataDestination(outputDestination)
              .build();


      CreateBatchSegmentJobRequest createBatchSegmentJobRequest = CreateBatchSegmentJobRequest.builder()
              .solutionVersionArn(solutionVersionArn)
              .filterArn(filterArn)
              .jobInput(jobInput)
              .jobOutput(jobOutputLocation)
              .jobName(jobName)
              .numResults(numResults)
              .roleArn(roleArn)
              .build();

      batchSegmentJobArn = personalizeClient.createBatchSegmentJob(createBatchSegmentJobRequest)
              .batchSegmentJobArn();
      DescribeBatchSegmentJobRequest describeBatchSegmentJobRequest = DescribeBatchSegmentJobRequest.builder()
              .batchSegmentJobArn(batchSegmentJobArn)
              .build();

      long maxTime = Instant.now().getEpochSecond() + 3 * 60 * 60;

      // wait until the batch segment job is complete.
      while (Instant.now().getEpochSecond() < maxTime) {

          BatchSegmentJob batchSegmentJob = personalizeClient
                  .describeBatchSegmentJob(describeBatchSegmentJobRequest)
                  .batchSegmentJob();

          status = batchSegmentJob.status();
          System.out.println("batch segment job status: " + status);

          if (status.equals("ACTIVE") || status.equals("CREATE FAILED")) {
              break;
          }
          try {
              Thread.sleep(waitInMilliseconds);
          } catch (InterruptedException e) {
              System.out.println(e.getMessage());
          }
      }
      return batchSegmentJobArn;

  } catch (PersonalizeException e) {
      System.out.println(e.awsErrorDetails().errorMessage());
  }
  return "";
}
```

------
#### [ SDK for JavaScript v3 ]

```
// Get service clients module and commands using ES6 syntax.
import { CreateBatchSegmentJobCommand } from "@aws-sdk/client-personalize";
import { personalizeClient } from "./libs/personalizeClients.js";

// Or, create the client here.
// const personalizeClient = new PersonalizeClient({ region: "REGION"});

// Set the batch segment job's parameters.

export const createBatchSegmentJobParam = {
  jobName: "NAME",
  jobInput: {
    s3DataSource: {
      path: "INPUT_PATH",
    },
  },
  jobOutput: {
    s3DataDestination: {
      path: "OUTPUT_PATH",
    },
  },
  roleArn: "ROLE_ARN",
  solutionVersionArn: "SOLUTION_VERSION_ARN",
  numResults: 20,
};

export const run = async () => {
  try {
    const response = await personalizeClient.send(
      new CreateBatchSegmentJobCommand(createBatchSegmentJobParam),
    );
    console.log("Success", response);
    return response; // For unit tests.
  } catch (err) {
    console.log("Error", err);
  }
};
run();
```

------

Processing the batch job might take a while to complete. You can check a job's status by calling [DescribeBatchSegmentJob](API_DescribeBatchSegmentJob.md) and passing a `batchSegmentJobArn` as the input parameter. You can also list all Amazon Personalize batch segment jobs in your AWS environment by calling [ListBatchSegmentJobs](API_ListBatchSegmentJobs.md). 

# Batch segment job output format examples
<a name="batch-segment-job-output-examples"></a>

A batch segment job imports your batch input data from an Amazon S3 bucket, uses your solution version trained with a USER\$1SEGMENTATION recipe to generate *user segments*, and exports the segments to an Amazon S3 bucket.

The following sections list JSON output examples for batch segment jobs by recipe.

**Topics**
+ [Item-Affinity](#batch-segment-output-item-affinity)
+ [Item-Attribute-Affinity](#batch-segment-output-item-attribute-affinity)

## Item-Affinity
<a name="batch-segment-output-item-affinity"></a>

 The following example shows the format of the output JSON file for the Item-Affinity recipe. 

```
{"input": {"itemId": "105"}, "output": {"recommendedUsers": ["106", "107", "49"]}}
{"input": {"itemId": "106"}, "output": {"recommendedUsers": ["105", "107", "49"]}}
{"input": {"itemId": "441"}, "output": {"recommendedUsers": ["2", "442", "435"]}}
...
```

## Item-Attribute-Affinity
<a name="batch-segment-output-item-attribute-affinity"></a>

 The following example shows the format of the output JSON file for the Item-Attribute-Affinity recipe. 

```
{"itemAttributes": "ITEMS.genres = \"Comedy\" AND ITEMS.genres = \"Action\"", "output": {"recommendedUsers": ["25", "78", "108"]}}
{"itemAttributes": "ITEMS.genres = \"Adventure\"", "output": {"recommendedUsers": ["87", "31", "129"]}}
{"itemAttributes": "ITEMS.genres = \"Horror\" AND ITEMS.genres = \"Action\"", "output": {"recommendedUsers": ["8", "442", "435"]}}
...
```