

# Input Data Setup


When you create a video frame labeling job, you need to let Ground Truth know where to look for your input data. You can do this in one of two ways:
+ You can store your input data in Amazon S3 and have Ground Truth automatically detect the input dataset used for your labeling job. See [Set up Automated Video Frame Input Data](sms-video-automated-data-setup.md) to learn more about this option. 
+ You can create an input manifest file and sequence files and upload them to Amazon S3. See [Set up Video Frame Input Data Manually](sms-video-manual-data-setup.md) to learn more about this option. 

**Topics**
+ [

# Set up Automated Video Frame Input Data
](sms-video-automated-data-setup.md)
+ [

# Set up Video Frame Input Data Manually
](sms-video-manual-data-setup.md)

# Set up Automated Video Frame Input Data


You can use the Ground Truth automated data setup to automatically detect video files in your Amazon S3 bucket and extract video frames from those files. To learn how, see [Provide Video Files](sms-point-cloud-video-input-data.md#sms-point-cloud-video-frame-extraction).

If you already have video frames in Amazon S3, you can use the automated data setup to use these video frames in your labeling job. For this option, all video frames from a single video must be stored using a unique prefix. To learn about the requirements to use this option, see [Provide Video Frames](sms-point-cloud-video-input-data.md#sms-video-provide-frames).

Select one of the following sections to learn how to set up your automatic input dataset connection with Ground Truth.

## Provide Video Files and Extract Frames


Use the following procedure to connect your video files with Ground Truth and automatically extract video frames from those files for video frame object detection and object tracking labeling jobs.

**Note**  
If you use the automated data setup console tool to extract video frames from more than 10 video files, you will need to modify the manifest file the tool generates or create a new one to include 10 video frame sequence files or less. To learn more, see [Provide Video Files](sms-point-cloud-video-input-data.md#sms-point-cloud-video-frame-extraction).

Make sure your video files are stored in an Amazon S3 bucket in the same AWS Region that you perform the automated data setup in. 

**Automatically connect your video files in Amazon S3 with Ground Truth and extract video frames:**

1. Navigate to the **Create labeling job** page in the Amazon SageMaker AI console: [https://console.aws.amazon.com/sagemaker/groundtruth](https://console.aws.amazon.com//sagemaker/groundtruth). 

   Your input and output S3 buckets must be located in the same AWS Region that you create your labeling job in. This link puts you in the North Virginia (us-east-1) AWS Region. If your input data is in an Amazon S3 bucket in another Region, switch to that Region. To change your AWS Region, on the [navigation bar](https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/getting-started.html#select-region), choose the name of the currently displayed Region.

1. Select **Create labeling job**.

1. Enter a **Job name**. 

1. In the section **Input data setup**, select **Automated data setup**.

1. Enter an Amazon S3 URI for **S3 location for input datasets**. An S3 URI looks like the following: `s3://amzn-s3-demo-bucket/path-to-files/`. This URI should point to the Amazon S3 location where your video files are stored.

1. Specify your **S3 location for output datasets**. This is where your output data is stored. You can choose to store your output data in the **Same location as input dataset** or **Specify a new location** and entering the S3 URI of the location that you want to store your output data.

1. Choose **Video Files** for your **Data type** using the dropdown list.

1. Choose **Yes, extract frames for object tracking and detection tasks**. 

1. Choose a method of **Frame extraction**.
   + When you choose **Use all frames extracted from the video to create a labeling task**, Ground Truth extracts all frames from each video in your **S3 location for input datasets**, up to 2,000 frames. If a video in your input dataset contains more than 2,000 frames, the first 2,000 are extracted and used for that labeling task. 
   + When you choose **Use every *x* frame from a video to create a labeling task**, Ground Truth extracts every *x*th frame from each video in your **S3 location for input datasets**. 

     For example, if your video is 2 seconds long, and has a [frame rate](https://en.wikipedia.org/wiki/Frame_rate) of 30 frames per second, there are 60 frames in your video. If you specify 10 here, Ground Truth extracts every 10th frame from your video. This means the 1st, 10th, 20th, 30th, 40th, 50th, and 60th frames are extracted. 

1. Choose or create an IAM execution role. Make sure that this role has permission to access your Amazon S3 locations for input and output data specified in steps 5 and 6. 

1. Select **Complete data setup**.

## Provide Video Frames


Use the following procedure to connect your sequences of video frames with Ground Truth for video frame object detection and object tracking labeling jobs. 

Make sure your video frames are stored in an Amazon S3 bucket in the same AWS Region that you perform the automated data setup in. Each sequence of video frames should have a unique prefix. For example, if you have two sequences stored in `s3://amzn-s3-demo-bucket/video-frames/sequences/`, each should have a unique prefix like `sequence1` and `sequence2` and should both be located directly under the `/sequences/` prefix. In the example above, the locations of these two sequences is: `s3://amzn-s3-demo-bucket/video-frames/sequences/sequence1/` and `s3://amzn-s3-demo-bucket/video-frames/sequences/sequence2/`. 

**Automatically connect your video frame in Amazon S3 with Ground Truth:**

1. Navigate to the **Create labeling job** page in the Amazon SageMaker AI console: [https://console.aws.amazon.com/sagemaker/groundtruth](https://console.aws.amazon.com//sagemaker/groundtruth). 

   Your input and output S3 buckets must be located in the same AWS Region that you create your labeling job in. This link puts you in the North Virginia (us-east-1) AWS Region. If your input data is in an Amazon S3 bucket in another Region, switch to that Region. To change your AWS Region, on the [navigation bar](https://docs.aws.amazon.com/awsconsolehelpdocs/latest/gsg/getting-started.html#select-region), choose the name of the currently displayed Region.

1. Select **Create labeling job**.

1. Enter a **Job name**. 

1. In the section **Input data setup**, select **Automated data setup**.

1. Enter an Amazon S3 URI for **S3 location for input datasets**. 

   This should be the Amazon S3 location where your sequences are stored. For example, if you have two sequences stored in `s3://amzn-s3-demo-bucket/video-frames/sequences/sequence1/`, `s3://amzn-s3-demo-bucket/video-frames/sequences/sequence2/`, enter `s3://amzn-s3-demo-bucket/video-frames/sequences/` here.

1. Specify your **S3 location for output datasets**. This is where your output data is stored. You can choose to store your output data in the **Same location as input dataset** or **Specify a new location** and entering the S3 URI of the location that you want to store your output data.

1. Choose **Video frames** for your **Data type** using the dropdown list. 

1. Choose or create an IAM execution role. Make sure that this role has permission to access your Amazon S3 locations for input and output data specified in steps 5 and 6. 

1. Select **Complete data setup**.

These procedures will create an input manifest in the Amazon S3 location for input datasets that you specified in step 5. If you are creating a labeling job using the SageMaker API or, AWS CLI, or an AWS SDK, use the Amazon S3 URI for this input manifest file as input to the parameter `ManifestS3Uri`.

# Set up Video Frame Input Data Manually


Choose the manual data setup option if you have created sequence files for each of your video frame sequences, and a manifest file listing references to those sequences files.

## Create a Video Frame Input Manifest File


 Ground Truth uses the input manifest file to identify the location of your input dataset when creating labeling tasks. For video frame object detection and object tracking labeling jobs, each line in the input manifest file identifies the location of a video frame sequence file. Each sequence file identifies the images included in a single sequence of video frames.

Use this page to learn how to create a video frame sequence file and an input manifest file for video frame object tracking and object detection labeling jobs.

If you want Ground Truth to automatically generate your sequence files and input manifest file, see [Set up Automated Video Frame Input Data](sms-video-automated-data-setup.md). 

### Create a Video Frame Sequence Input Manifest


In the video frame sequence input manifest file, each line in the manifest is a JSON object, with a `"source-ref"` key that references a sequence file. Each sequence file identifies the location of a sequence of video frames. This is the manifest file formatting required for all video frame labeling jobs. 

The following example demonstrates the syntax used for an input manifest file:

```
{"source-ref": "s3://amzn-s3-demo-bucket/example-folder/seq1.json"}
{"source-ref": "s3://amzn-s3-demo-bucket/example-folder/seq2.json"}
```

### Create a Video Frame Sequence File


The data for each sequence of video frames needs to be stored in a JSON data object. The following is an example of the format you use for a sequence file. Information about each frame is included as a JSON object and is listed in the `frames` list. The following JSON has been expanded for readability. 

```
{
 "seq-no": 1,
 "prefix": "s3://amzn-s3-demo-bucket/prefix/video1/",
 "number-of-frames": 3,
 "frames":[
   {"frame-no": 1, "unix-timestamp": 1566861644, "frame": "frame0001.jpg" },
   {"frame-no": 2, "unix-timestamp": 1566861644, "frame": "frame0002.jpg" }, 
   {"frame-no": 3, "unix-timestamp": 1566861644, "frame": "frame0003.jpg" }   
 ]
}
```

The following table provides details about the parameters shown in the this code example. 


****  

|  Parameter  |  Required  |  Accepted Values  |  Description  | 
| --- | --- | --- | --- | 
|  `seq-no`  |  Yes  |  Integer  |  The ordered number of the sequence.   | 
|  `prefix`  |  Yes  |  String **Accepted Values**: `s3://<bucket-name>/<prefix>/`  |  The Amazon S3 location where the sequence files are located.  The prefix must end with a forward slash: `/`.  | 
|  `number-of-frames`  |  Yes  |  Integer  |  The total number of frames included in the sequence file. This number must match the total number of frames listed in the `frames` parameter in the next row.  | 
|  `frames`  |  Yes  |  List of JSON objects **Required**: `frame-no`, `frame` **Optional**: `unix-timestamp`  |  A list of frame data. The length of the list must equal `number-of-frames`. In the worker UI, frames in a sequence are ordered in [UTF-8](https://en.wikipedia.org/wiki/UTF-8) binary order. To learn more about this ordering, see [Provide Video Frames](sms-point-cloud-video-input-data.md#sms-video-provide-frames).  | 
| frame-no |  Yes  |  Integer  |  The frame order number. This will determine the order of a frame in the sequence.   | 
|  `unix-timestamp`  |  No  |  Integer  |  The unix timestamp of a frame. The number of seconds since January 1st, 1970 until the UTC time when the frame was captured.   | 
| frame |  Yes  |  String  |  The name of a video frame image file.   | 