# Custom labeling workflows These topics help you set up a Ground Truth labeling job that uses a custom labeling template. A custom labeling template allows you to create a custom worker portal UI that workers will use to label data. Template can be created using HTML, CSS, JavaScript, [Liquid template language](https://shopify.github.io/liquid/), and [Crowd HTML Elements](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-ui-template-reference.html). ## Overview If this is your first time creating a custom labeling workflow in Ground Truth, the following list is a high-level summary of the steps required. 1. *Set up your workforce* – To create a custom labeling workflow you need a workforce. This topic teaches you about configuring a workforce. 1. *Creating a custom template* – To create a custom template you must map the data from your input manifest file correctly to the variables in your template. 1. *Using optional processing Lambda functions* – To control how data from your input manifest is added to your worker template, and how worker annotations are logged in your job's output file. This topic also has three end-to-end demos to help you better understand how to use custom labeling templates. **Note** The examples in the links below all include pre-annotation and post-annotation Lambda functions. These Lambda functions are optional. + [Demo template: Annotation of images with `crowd-bounding-box`](sms-custom-templates-step2-demo1.md) + [Demo Template: Labeling Intents with `crowd-classifier`](sms-custom-templates-step2-demo2.md) + [Build a custom data labeling workflow with Amazon SageMaker Ground Truth](https://aws.amazon.com/blogs/machine-learning/build-a-custom-data-labeling-workflow-with-amazon-sagemaker-ground-truth/) **Topics** + [Overview](#sms-custom-templates-overview) + [Set up your workforce](sms-custom-templates-step1.md) + [Creating a custom worker task template](sms-custom-templates-step2.md) + [Adding automation with Liquid](sms-custom-templates-step2-automate.md) + [Processing data in a custom labeling workflow with AWS Lambda](sms-custom-templates-step3.md) + [Demo template: Annotation of images with `crowd-bounding-box`](sms-custom-templates-step2-demo1.md) + [Demo Template: Labeling Intents with `crowd-classifier`](sms-custom-templates-step2-demo2.md) + [Create a custom workflow using the API](sms-custom-templates-step4.md) # Set up your workforce In this step you use the console to establish which worker type to use and make the necessary sub-selections for the worker type. It assumes you have already completed the steps up to this point in the [Getting started: Create a bounding box labeling job with Ground Truth](sms-getting-started.md) section and have chosen the **Custom labeling task** as the **Task type**. **To configure your workforce.** 1. First choose an option from the **Worker types**. There are three types currently available: + **Public** uses an on-demand workforce of independent contractors, powered by Amazon Mechanical Turk. They are paid on a per-task basis. + **Private** uses your employees or contractors for handling data that needs to stay within your organization. + **Vendor** uses third party vendors that specialize in providing data labeling services, available via the AWS Marketplace. 1. If you choose the **Public** option, you are asked to set the **number of workers per dataset object**. Having more than one worker perform the same task on the same object can help increase the accuracy of your results. The default is three. You can raise or lower that depending on the accuracy you need. You are also asked to set a **price per task** by using a drop-down menu. The menu recommends price points based on how long it will take to complete the task. The recommended method to determine this is to first run a short test of your task with a **private** workforce. The test provides a realistic estimate of how long the task takes to complete. You can then select the range your estimate falls within on the **Price per task** menu. If your average time is more than 5 minutes, consider breaking your task into smaller units. ## Next [Creating a custom worker task template](sms-custom-templates-step2.md) # Creating a custom worker task template To create a custom labeling job, you need to update the worker task template, map the input data from your manifest file to the variables used in the template, and map the output data to Amazon S3. To learn more about advanced features that use Liquid automation, see [Adding automation with Liquid](sms-custom-templates-step2-automate.md). The following sections describe each of the required steps. ## Worker task template A *worker task template* is a file used by Ground Truth to customize the worker user interface (UI). You can create a worker task template using HTML, CSS, JavaScript, [Liquid template language](https://shopify.github.io/liquid/), and [Crowd HTML Elements](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-ui-template-reference.html). Liquid is used to automate the template. Crowd HTML Elements are used to include common annotation tools and provide the logic to submit to Ground Truth. Use the following topics to learn how you can create a worker task template. You can see a repository of example Ground Truth worker task templates on [GitHub](https://github.com/aws-samples/amazon-sagemaker-ground-truth-task-uis). ### Using the base worker task template in the SageMaker AI console You can use a template editor in the Ground Truth console to start creating a template. This editor includes a number of pre-designed base templates. It supports autofill for HTML and Crowd HTML Element code. **To access the Ground Truth custom template editor:** 1. Following the instructions in [Create a Labeling Job (Console)](sms-create-labeling-job-console.md). 1. Then select **Custom** for the labeling job **Task type**. 1. Choose **Next**, and then you can access the template editor and base templates in the **Custom labeling task setup** section. 1. (Optional) Select a base template from the drop-down menu under **Templates**. If you prefer to create a template from scratch, choose **Custom** from the drop down-menu for a minimal template skeleton. Use the following section to learn how to visualize a template developed in the console locally. #### Visualizing your worker task templates locally You must use the console to test how your template processes incoming data. To test the look and feel of your template's HTML and custom elements you can use your browser. **Note** Variables will not be parsed. You may need to replace them with sample content while viewing your content locally. The following example code snippet loads the necessary code to render the custom HTML elements. Use this if you want to develop your template's look and feel in your preferred editor rather than in the console. **Example** ``` ``` ### Creating a simple HTML task sample Now that you have the base worker task template, you can use this topic to create a simple HTML-based task template. The following is an example entry from an input manifest file. ``` { "source": "This train is really late.", "labels": [ "angry" , "sad", "happy" , "inconclusive" ], "header": "What emotion is the speaker feeling?" } ``` In the HTML task template we need to map the variables from input manifest file to the template. The variable from the example input manifest would be mapped using the following syntax **task.input.source**, **task.input.labels**, and **task.input.header**. The following is a simple example HTML worker task template for tweet-analysis. All tasks begin and end with the ` ` elements. Like standard HTML `

` elements, all of your form code should go between them. Ground Truth generates the workers' tasks directly from the context specified in the template, unless you implement a pre-annotation Lambda. The `taskInput` object returned by Ground Truth or [Pre-annotation Lambda](sms-custom-templates-step3-lambda-requirements.md#sms-custom-templates-step3-prelambda) is the `task.input` object in your templates. For a simple tweet-analysis task, use the `` element. It requires the following attributes: + *name* - The name of your output variable. Worker annotations are saved to this variable name in your output manifest. + *categories* - a JSON formatted array of the possible answers. + *header* - a title for the annotation tool The `` element requires at least the three following child elements. + ** - The text the worker will classify based on the options specified in the `categories` attribute above. + ** - Instructions that are available from the "View full instructions" link in the tool. This can be left blank, but it is recommended that you give good instructions to get better results. + ** - A more brief description of the task that appears in the tool's sidebar. This can be left blank, but it is recommended that you give good instructions to get better results. A simple version of this tool would look like the following. The variable **\$1\$1 task.input.source \$1\$1** is what specifies the source data from your input manifest file. The **\$1\$1 task.input.labels \$1 to\$1json \$1\$1** is an example of a variable filter to turn the array into a JSON representation. The `categories` attribute must be JSON. **Example of using `crowd-classifier` with the sample input manifest json** ```

Try to determine the sentiment the author of the tweet is trying to express. If none seem to match, choose "cannot determine."

Pick the term that best describes the sentiment of the tweet.

``` You can copy and paste the code into the editor in the Ground Truth labeling job creation workflow to preview the tool, or try out a [demo of this code on CodePen.](https://codepen.io/MTGT/full/OqBvJw) [https://codepen.io/MTGT/full/OqBvJw](https://codepen.io/MTGT/full/OqBvJw) ## Input data, external assets and your task template Following sections describe the use of external assets, input data format requirements, and when to consider using pre-annotation Lambda functions. ### Input data format requirements When you create an input manifest file to use in your custom Ground Truth labeling job, you must store the data in Amazon S3. The input manifest files must also be saved in the same AWS Region in which your custom Ground Truth labeling job is to be run. Furthermore, it can be stored in any Amazon S3 bucket that is accessible to the IAM service role that you use to run your custom labeling job in Ground Truth. Input manifest files must use the newline-delimited JSON or JSON lines format. Each line is delimited by a standard line break, **\$1n** or **\$1r\$1n**. Each line must also be a valid JSON object. Furthermore, each JSON object in the manifest file must contain one of the following keys: `source-ref` or `source`. The value of the keys are interpreted as follows: + `source-ref` – The source of the object is the Amazon S3 object specified in the value. Use this value when the object is a binary object, such as an image. + `source` – The source of the object is the value. Use this value when the object is a text value. To learn more about formatting your input manifest files, see [Input manifest files](sms-input-data-input-manifest.md). ### Pre-annotation Lambda function You can optionally specify a *pre-annotation Lambda* function to manage how data from your input manifest file is handled prior to labeling. If you have specified the `isHumanAnnotationRequired` key-value pair you must us a pre-annotation Lambda function. When Ground Truth sends the pre-annotation Lambda function a JSON formatted request it uses the following schemas. **Example data object identified with the `source-ref` key-value pair** ``` { "version": "2018-10-16", "labelingJobArn": arn:aws:lambda:us-west-2:555555555555:function:my-function "dataObject" : { "source-ref": s3://input-data-bucket/data-object-file-name } } ``` **Example data object identified with the `source` key-value pair** ``` { "version": "2018-10-16", "labelingJobArn" : arn:aws:lambda:us-west-2:555555555555:function:my-function "dataObject" : { "source": Sue purchased 10 shares of the stock on April 10th, 2020 } } ``` The following is the expected response from the Lambda function when `isHumanAnnotationRequired` is used. ``` { "taskInput": { "source": "This train is really late.", "labels": [ "angry" , "sad" , "happy" , "inconclusive" ], "header": "What emotion is the speaker feeling?" }, "isHumanAnnotationRequired": False } ``` ### Using External Assets Amazon SageMaker Ground Truth custom templates allow external scripts and style sheets to be embedded. For example, the following code block demonstrates how you would add a style sheet located at `https://www.example.com/my-enhancement-styles.css` to your template. **Example** ``` ``` If you encounter errors, ensure that your originating server is sending the correct MIME type and encoding headers with the assets. For example, the MIME and encoding types for remote scripts are: `application/javascript;CHARSET=UTF-8`. The MIME and encoding type for remote stylesheets are: `text/css;CHARSET=UTF-8`. ## Output data and your task template The following sections describe the output data from a custom labeling job, and when to consider using a post-annotation Lambda function. ### Output data When your custom labeling job is finished, the data is saved in the Amazon S3 bucket specified when the labeling job was created. The data is saved in an `output.manifest` file. **Note** *labelAttributeName* is a placeholder variable. In your output it is either the name of your labeling job, or the label attribute name you specify when you create the labeling job. + `source` or `source-ref` – Either the string or an S3 URI workers were asked to label. + `labelAttributeName` – A dictionary containing consolidated label content from the [post-annotation Lambda function](sms-custom-templates-step3-lambda-requirements.md#sms-custom-templates-step3-postlambda). If a post-annotation Lambda function is not specified, this dictionary will be empty. + `labelAttributeName-metadata` – Metadata from your custom labeling job added by Ground Truth. + `worker-response-ref` – The S3 URI of the bucket where the data is saved. If a post-annotation Lambda function is specified this key-value pair will not present. In this example the JSON object is formatted for readability, in the actual output file the JSON object is on a single line. ``` { "source" : "This train is really late.", "labelAttributeName" : {}, "labelAttributeName-metadata": { # These key values pairs are added by Ground Truth "job_name": "test-labeling-job", "type": "groundTruth/custom", "human-annotated": "yes", "creation_date": "2021-03-08T23:06:49.111000", "worker-response-ref": "s3://amzn-s3-demo-bucket/test-labeling-job/annotations/worker-response/iteration-1/0/2021-03-08_23:06:49.json" } } ``` ### Using a post annotation Lambda to consolidate the results from your workers By default Ground Truth saves worker responses unprocessed in Amazon S3. To have more fine-grained control over how responses are handled, you can specify a *post-annotation Lambda function*. For example, a post-annotation Lambda function could be used to consolidate annotation if multiple workers have labeled the same data object. To learn more about creating post-annotation Lambda functions, see [Post-annotation Lambda](sms-custom-templates-step3-lambda-requirements.md#sms-custom-templates-step3-postlambda). If you want to use a post-annotation Lambda function, it must be specified as part of the [https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_AnnotationConsolidationConfig.html](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_AnnotationConsolidationConfig.html) in a `CreateLabelingJob` request. To learn more about how annotation consolidation works, see [Annotation consolidation](sms-annotation-consolidation.md). # Adding automation with Liquid Our custom template system uses [Liquid](https://shopify.github.io/liquid/) for automation. It is an open source inline markup language. In Liquid, the text between single curly braces and percent symbols is an instruction or *tag* that performs an operation like control flow or iteration. Text between double curly braces is a variable or *object* that outputs its value. The most common use of Liquid will be to parse the data coming from your input manifest file, and pull out the relevant variables to create the task. Ground Truth automatically generates the tasks unless a pre-annotation Lambda is specified. The `taskInput` object returned by Ground Truth or your [Pre-annotation Lambda](sms-custom-templates-step3-lambda-requirements.md#sms-custom-templates-step3-prelambda) is the `task.input` object in your templates. The properties in your input manifest are passed into your template as the `event.dataObject`. **Example manifest data object** ``` { "source": "This is a sample text for classification", "labels": [ "angry" , "sad" , "happy" , "inconclusive" ], "header": "What emotion is the speaker feeling?" } ``` **Example sample HTML using variables** ``` {{ task.input.source }} ``` Note the addition of ` | to_json` to the `labels` property above. That is a filter that turns the input manifest array into a JSON representation of the array. Variable filters are explained in the next section. The following list includes two types of Liquid tags that you may find useful to automate template input data processing. If you select one of the following tag-types, you will be redirected to the Liquid documentation. + [Control flow](https://shopify.github.io/liquid/tags/control-flow/): Includes programming logic operators like `if/else`, `unless`, and `case/when`. + [Iteration](https://shopify.github.io/liquid/tags/iteration/): Enables you to run blocks of code repeatedly using statements like for loops. For an example of an HTML template that uses Liquid elements to create a for loop, see [translation-review-and-correction.liquid.html](https://github.com/aws-samples/amazon-sagemaker-ground-truth-task-uis/blob/8ae02533ea5a91087561b1daecd0bc22a37ca393/text/translation-review-and-correction.liquid.html) in GitHub. For more information and documentation, visit the [Liquid homepage](https://shopify.github.io/liquid/). ## Variable filters In addition to the standard [Liquid filters](https://shopify.github.io/liquid/filters/abs/) and actions, Ground Truth offers a few additional filters. Filters are applied by placing a pipe (`|`) character after the variable name, then specifying a filter name. Filters can be chained in the form of: **Example** ``` {{ | | }} ``` ### Autoescape and explicit escape By default, inputs will be HTML escaped to prevent confusion between your variable text and HTML. You can explicitly add the `escape` filter to make it more obvious to someone reading the source of your template that the escaping is being done. ### escape\$1once `escape_once` ensures that if you've already escaped your code, it doesn't get re-escaped on top of that. For example, so that & doesn't become &. ### skip\$1autoescape `skip_autoescape` is useful when your content is meant to be used as HTML. For example, you might have a few paragraphs of text and some images in the full instructions for a bounding box. **Use `skip_autoescape` sparingly** The best practice in templates is to avoid passing in functional code or markup with `skip_autoescape` unless you are absolutely sure you have strict control over what's being passed. If you're passing user input, you could be opening your workers up to a Cross Site Scripting attack. ### to\$1json `to_json` will encode what you feed it to JSON (JavaScript Object Notation). If you feed it an object, it will serialize it. ### grant\$1read\$1access `grant_read_access` takes an S3 URI and encodes it into an HTTPS URL with a short-lived access token for that resource. This makes it possible to display to workers the photo, audio, or video objects stored in S3 buckets that are not otherwise publicly accessible. ### s3\$1presign The `s3_presign` filter works the same way as the `grant_read_access` filter. `s3_presign` takes an Amazon S3 URI and encodes it into an HTTPS URL with a short-lived access token for that resource. This makes it possible to display photo, audio, or video objects stored in S3 buckets that are not otherwise publicly accessible to workers. **Example of the variable filters** Input ``` auto-escape: {{ "Have you read 'James & the Giant Peach'?" }} explicit escape: {{ "Have you read 'James & the Giant Peach'?" | escape }} explicit escape_once: {{ "Have you read 'James & the Giant Peach'?" | escape_once }} skip_autoescape: {{ "Have you read 'James & the Giant Peach'?" | skip_autoescape }} to_json: {{ jsObject | to_json }} grant_read_access: {{ "s3://amzn-s3-demo-bucket/myphoto.png" | grant_read_access }} s3_presign: {{ "s3://amzn-s3-demo-bucket/myphoto.png" | s3_presign }} ``` **Example** Output ``` auto-escape: Have you read 'James & the Giant Peach'? explicit escape: Have you read 'James & the Giant Peach'? explicit escape_once: Have you read 'James & the Giant Peach'? skip_autoescape: Have you read 'James & the Giant Peach'? to_json: { "point_number": 8, "coords": [ 59, 76 ] } grant_read_access: https://s3.amazonaws.com/amzn-s3-demo-bucket/myphoto.png? s3_presign: https://s3.amazonaws.com/amzn-s3-demo-bucket/myphoto.png? ``` **Example of an automated classification template.** To automate the simple text classification sample, replace the tweet text with a variable. The text classification template is below with automation added. The changes/additions are highlighted in bold. ```

Try to determine the feeling the author of the tweet is trying to express. If none seem to match, choose "other."

Pick the term best describing the sentiment of the tweet.

``` The tweet text in the prior sample is now replaced with an object. The `entry.taskInput` object uses `source` (or another name you specify in your pre-annotation Lambda) as the property name for the text, and it is inserted directly in the HTML by virtue of being between double curly braces. # Processing data in a custom labeling workflow with AWS Lambda In this topic, you can learn how to deploy optional [AWS Lambda](https://aws.amazon.com/lambda/) functions when creating a custom labeling workflow. You can specify two types of Lambda functions to use with your custom labeling workflow. + *Pre-annotation Lambda*: This function pre-processes each data object sent to your labeling job prior to sending it to workers. + *Post-annotation Lambda*: This function processes the results once workers submit a task. If you specify multiple workers per data object, this function may include logic to consolidate annotations. If you are a new user of Lambda and Ground Truth, we recommend that you use the pages in this section as follows: 1. First, review [Using pre-annotation and post-annotation Lambda functionsUsing Lambda functions](sms-custom-templates-step3-lambda-requirements.md). 1. Then, use the page [Add required permissions to use AWS Lambda with Ground Truth](sms-custom-templates-step3-lambda-permissions.md) to learn about security and permission requirements to use your pre-annotation and post-annotation Lambda functions in a Ground Truth custom labeling job. 1. Next, you need to visit the Lambda console or use Lambda's APIs to create your functions. Use the section [Create Lambda functions using Ground Truth templates](sms-custom-templates-step3-lambda-create.md) to learn how to create Lambda functions. 1. To learn how to test your Lambda functions, see [Test pre-annotation and post-annotation Lambda functions](sms-custom-templates-step3-lambda-test.md). 1. After you create pre-processing and post-processing Lambda functions, select them from the **Lambda functions** section that comes after the code editor for your custom HTML in the Ground Truth console. To learn how to use these functions in a `CreateLabelingJob` API request, see [Create a Labeling Job (API)](sms-create-labeling-job-api.md). For a custom labeling workflow tutorial that includes example pre-annotation and post-annotation Lambda functions, see [Demo template: Annotation of images with `crowd-bounding-box`](sms-custom-templates-step2-demo1.md). **Topics** + [Using pre-annotation and post-annotation Lambda functions](sms-custom-templates-step3-lambda-requirements.md) + [Add required permissions to use AWS Lambda with Ground Truth](sms-custom-templates-step3-lambda-permissions.md) + [Create Lambda functions using Ground Truth templates](sms-custom-templates-step3-lambda-create.md) + [Test pre-annotation and post-annotation Lambda functions](sms-custom-templates-step3-lambda-test.md) # Using pre-annotation and post-annotation Lambda functions Use these topics to learn about the syntax of the requests sent to pre-annotation and post-annotation Lambda functions, and the required response syntax that Ground Truth uses in custom labeling workflows. **Topics** + [Pre-annotation Lambda](#sms-custom-templates-step3-prelambda) + [Post-annotation Lambda](#sms-custom-templates-step3-postlambda) ## Pre-annotation Lambda Before a labeling task is sent to the worker, a optional pre-annotation Lambda function can be invoked. Ground Truth sends your Lambda function a JSON formatted request to provide details about the labeling job and the data object. The following are 2 example JSON formatted requests. ------ #### [ Data object identified with "source-ref" ] ``` { "version": "2018-10-16", "labelingJobArn": "dataObject" : { "source-ref": } } ``` ------ #### [ Data object identified with "source" ] ``` { "version": "2018-10-16", "labelingJobArn": "dataObject" : { "source": } } ``` ------ The following list contains the pre-annotation request schemas. Each parameter is described below. + `version` (string): This is a version number used internally by Ground Truth. + `labelingJobArn` (string): This is the Amazon Resource Name, or ARN, of your labeling job. This ARN can be used to reference the labeling job when using Ground Truth API operations such as `DescribeLabelingJob`. + The `dataObject` (JSON object): The key contains a single JSON line, either from your input manifest file or sent from Amazon SNS. The JSON line objects in your manifest can be up to 100 kilobytes in size and contain a variety of data. For a very basic image annotation job, the `dataObject` JSON may just contain a `source-ref` key, identifying the image to be annotated. If the data object (for example, a line of text) is included directly in the input manifest file, the data object is identified with `source`. If you create a verification or adjustment job, this line may contain label data and metadata from the previous labeling job. The following tabbed examples show examples of a pre-annotation request. Each parameter in these example requests is explained below the tabbed table. ------ #### [ Data object identified with "source-ref" ] ``` { "version": "2018-10-16", "labelingJobArn": "arn:aws:sagemaker:us-west-2:111122223333:labeling-job/" "dataObject" : { "source-ref": "s3://input-data-bucket/data-object-file-name" } } ``` ------ #### [ Data object identified with "source" ] ``` { "version": "2018-10-16", "labelingJobArn": "arn:aws:sagemaker::111122223333:labeling-job/" "dataObject" : { "source": "Sue purchased 10 shares of the stock on April 10th, 2020" } } ``` ------ In return, Ground Truth requires a response formatted like the following: **Example of expected return data** ``` { "taskInput": , "isHumanAnnotationRequired": # Optional } ``` In the previous example, the `` needs to contain *all* the data your custom worker task template needs. If you're doing a bounding box task where the instructions stay the same all the time, it may just be the HTTP(S) or Amazon S3 resource for your image file. If it's a sentiment analysis task and different objects may have different choices, it is the object reference as a string and the choices as an array of strings. **Implications of `isHumanAnnotationRequired`** This value is optional because it defaults to `true`. The primary use case for explicitly setting it is when you want to exclude this data object from being labeled by human workers. If you have a mix of objects in your manifest, with some requiring human annotation and some not needing it, you can include a `isHumanAnnotationRequired` value in each data object. You can add logic to your pre-annotation Lambda to dynamically determine if an object requires annotation, and set this boolean value accordingly. ### Examples of pre-annotation Lambda functions The following basic pre-annotation Lambda function accesses the JSON object in `dataObject` from the initial request, and returns it in the `taskInput` parameter. ``` import json def lambda_handler(event, context): return { "taskInput": event['dataObject'] } ``` Assuming the input manifest file uses `"source-ref"` to identify data objects, the worker task template used in the same labeling job as this pre-annotation Lambda must include a Liquid element like the following to ingest `dataObject`: ``` {{ task.input.source-ref | grant_read_access }} ``` If the input manifest file used `source` to identify the data object, the work task template can ingest `dataObject` with the following: ``` {{ task.input.source }} ``` The following pre-annotation Lambda example includes logic to identify the key used in `dataObject`, and to point to that data object using `taskObject` in the Lambda's return statement. ``` import json def lambda_handler(event, context): # Event received print("Received event: " + json.dumps(event, indent=2)) # Get source if specified source = event['dataObject']['source'] if "source" in event['dataObject'] else None # Get source-ref if specified source_ref = event['dataObject']['source-ref'] if "source-ref" in event['dataObject'] else None # if source field present, take that otherwise take source-ref task_object = source if source is not None else source_ref # Build response object output = { "taskInput": { "taskObject": task_object }, "humanAnnotationRequired": "true" } print(output) # If neither source nor source-ref specified, mark the annotation failed if task_object is None: print(" Failed to pre-process {} !".format(event["labelingJobArn"])) output["humanAnnotationRequired"] = "false" return output ``` ## Post-annotation Lambda When all workers have annotated the data object or when [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanLoopConfig.html#SageMaker-Type-HumanLoopConfig-TaskAvailabilityLifetimeInSeconds](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanLoopConfig.html#SageMaker-Type-HumanLoopConfig-TaskAvailabilityLifetimeInSeconds) has been reached, whichever comes first, Ground Truth sends those annotations to your post-annotation Lambda. This Lambda is generally used for [Annotation consolidation](sms-annotation-consolidation.md). **Note** To see an example of a post-consolidation Lambda function, see [annotation\$1consolidation\$1lambda.py](https://github.com/aws-samples/aws-sagemaker-ground-truth-recipe/blob/master/aws_sagemaker_ground_truth_sample_lambda/annotation_consolidation_lambda.py) in the [aws-sagemaker-ground-truth-recipe](https://github.com/aws-samples/aws-sagemaker-ground-truth-recipe) GitHub repository. The following code block contains the post-annotation request schema. Each parameter is described in the following bulleted list. ``` { "version": "2018-10-16", "labelingJobArn": , "labelCategories": [], "labelAttributeName": , "roleArn" : , "payload": { "s3Uri": } } ``` + `version` (string): A version number used internally by Ground Truth. + `labelingJobArn` (string): The Amazon Resource Name, or ARN, of your labeling job. This ARN can be used to reference the labeling job when using Ground Truth API operations such as `DescribeLabelingJob`. + `labelCategories` (list of strings): Includes the label categories and other attributes you either specified in the console, or that you include in the label category configuration file. + `labelAttributeName` (string): Either the name of your labeling job, or the label attribute name you specify when you create the labeling job. + `roleArn` (string): The Amazon Resource Name (ARN) of the IAM execution role you specify when you create the labeling job. + `payload` (JSON object): A JSON that includes an `s3Uri` key, which identifies the location of the annotation data for that data object in Amazon S3. The second code block below shows an example of this annotation file. The following code block contains an example of a post-annotation request. Each parameter in this example request is explained below the code block. **Example of an post-annotation Lambda request** ``` { "version": "2018-10-16", "labelingJobArn": "arn:aws:sagemaker:us-west-2:111122223333:labeling-job/labeling-job-name", "labelCategories": ["Ex Category1","Ex Category2", "Ex Category3"], "labelAttributeName": "labeling-job-attribute-name", "roleArn" : "arn:aws:iam::111122223333:role/role-name", "payload": { "s3Uri": "s3://amzn-s3-demo-bucket/annotations.json" } } ``` **Note** If no worker works on the data object and `TaskAvailabilityLifetimeInSeconds` has been reached, the data object is marked as failed and not included as part of post-annotation Lambda invocation. The following code block contains the payload schema. This is the file that is indicated by the `s3Uri` parameter in the post-annotation Lambda request `payload` JSON object. For example, if the previous code block is the post-annotation Lambda request, the following annotation file is located at `s3://amzn-s3-demo-bucket/annotations.json`. Each parameter is described in the following bulleted list. **Example of an annotation file** ``` [ { "datasetObjectId": , "dataObject": { "s3Uri": , "content": }, "annotations": [{ "workerId": , "annotationData": { "content": , "s3Uri": } }] } ] ``` + `datasetObjectId` (string): Identifies a unique ID that Ground Truth assigns to each data object you send to the labeling job. + `dataObject` (JSON object): The data object that was labeled. If the data object is included in the input manifest file and identified using the `source` key (for example, a string), `dataObject` includes a `content` key, which identifies the data object. Otherwise, the location of the data object (for example, a link or S3 URI) is identified with `s3Uri`. + `annotations` (list of JSON objects): This list contains a single JSON object for each annotation submitted by workers for that `dataObject`. A single JSON object contains a unique `workerId` that can be used to identify the worker that submitted that annotation. The `annotationData` key contains one of the following: + `content` (string): Contains the annotation data. + `s3Uri` (string): Contains an S3 URI that identifies the location of the annotation data. The following table contains examples of the content that you may find in payload for different types of annotation. ------ #### [ Named Entity Recognition Payload ] ``` [ { "datasetObjectId": "1", "dataObject": { "content": "Sift 3 cups of flour into the bowl." }, "annotations": [ { "workerId": "private.us-west-2.ef7294f850a3d9d1", "annotationData": { "content": "{\"crowd-entity-annotation\":{\"entities\":[{\"endOffset\":4,\"label\":\"verb\",\"startOffset\":0},{\"endOffset\":6,\"label\":\"number\",\"startOffset\":5},{\"endOffset\":20,\"label\":\"object\",\"startOffset\":15},{\"endOffset\":34,\"label\":\"object\",\"startOffset\":30}]}}" } } ] } ] ``` ------ #### [ Semantic Segmentation Payload ] ``` [ { "datasetObjectId": "2", "dataObject": { "s3Uri": "s3://amzn-s3-demo-bucket/gt-input-data/images/bird3.jpg" }, "annotations": [ { "workerId": "private.us-west-2.ab1234c5678a919d0", "annotationData": { "content": "{\"crowd-semantic-segmentation\":{\"inputImageProperties\":{\"height\":2000,\"width\":3020},\"labelMappings\":{\"Bird\":{\"color\":\"#2ca02c\"}},\"labeledImage\":{\"pngImageData\":\"iVBOR...\"}}}" } } ] } ] ``` ------ #### [ Bounding Box Payload ] ``` [ { "datasetObjectId": "0", "dataObject": { "s3Uri": "s3://amzn-s3-demo-bucket/gt-input-data/images/bird1.jpg" }, "annotations": [ { "workerId": "private.us-west-2.ab1234c5678a919d0", "annotationData": { "content": "{\"boundingBox\":{\"boundingBoxes\":[{\"height\":2052,\"label\":\"Bird\",\"left\":583,\"top\":302,\"width\":1375}],\"inputImageProperties\":{\"height\":2497,\"width\":3745}}}" } } ] } ] ``` ------ Your post-annotation Lambda function may contain logic similar to the following to loop through and access all annotations contained in the request. For a full example, see [annotation\$1consolidation\$1lambda.py](https://github.com/aws-samples/aws-sagemaker-ground-truth-recipe/blob/master/aws_sagemaker_ground_truth_sample_lambda/annotation_consolidation_lambda.py) in the [aws-sagemaker-ground-truth-recipe](https://github.com/aws-samples/aws-sagemaker-ground-truth-recipe) GitHub repository. In this GitHub example, you must add your own annotation consolidation logic. ``` for i in range(len(annotations)): worker_id = annotations[i]["workerId"] annotation_content = annotations[i]['annotationData'].get('content') annotation_s3_uri = annotations[i]['annotationData'].get('s3uri') annotation = annotation_content if annotation_s3_uri is None else s3_client.get_object_from_s3( annotation_s3_uri) annotation_from_single_worker = json.loads(annotation) print("{} Received Annotations from worker [{}] is [{}]" .format(log_prefix, worker_id, annotation_from_single_worker)) ``` **Tip** When you run consolidation algorithms on the data, you can use an AWS database service to store results, or you can pass the processed results back to Ground Truth. The data you return to Ground Truth is stored in consolidated annotation manifests in the S3 bucket specified for output during the configuration of the labeling job. In return, Ground Truth requires a response formatted like the following: **Example of expected return data** ``` [ { "datasetObjectId": , "consolidatedAnnotation": { "content": { "": { # ... label content } } } }, { "datasetObjectId": , "consolidatedAnnotation": { "content": { "": { # ... label content } } } } . . . ] ``` At this point, all the data you're sending to your S3 bucket, other than the `datasetObjectId`, is in the `content` object. When you return annotations in `content`, this results in an entry in your job's output manifest like the following: **Example of label format in output manifest** ``` { "source-ref"/"source" : "", "": { # ... label content from you }, "-metadata": { # This will be added by Ground Truth "job_name": , "type": "groundTruth/custom", "human-annotated": "yes", "creation_date": # Timestamp of when received from Post-labeling Lambda } } ``` Because of the potentially complex nature of a custom template and the data it collects, Ground Truth does not offer further processing of the data. # Add required permissions to use AWS Lambda with Ground Truth You may need to configure some or all the following to create and use AWS Lambda with Ground Truth. + You need to grant an IAM role or user (collectively, an IAM entity) permission to create the pre-annotation and post-annotation Lambda functions using AWS Lambda, and to choose them when creating the labeling job. + The IAM execution role specified when the labeling job is configured needs permission to invoke the pre-annotation and post-annotation Lambda functions. + The post-annotation Lambda functions may need permission to access Amazon S3. Use the following sections to learn how to create the IAM entities and grant permissions described above. **Topics** + [Grant Permission to Create and Select an AWS Lambda Function](#sms-custom-templates-step3-postlambda-create-perms) + [Grant IAM Execution Role Permission to Invoke AWS Lambda Functions](#sms-custom-templates-step3-postlambda-execution-role-perms) + [Grant Post-Annotation Lambda Permissions to Access Annotation](#sms-custom-templates-step3-postlambda-perms) ## Grant Permission to Create and Select an AWS Lambda Function If you do not require granular permissions to develop pre-annotation and post-annotation Lambda functions, you can attach the AWS managed policy `AWSLambda_FullAccess` to a user or role. This policy grants broad permissions to use all Lambda features, as well as permission to perform actions in other AWS services with which Lambda interacts. To create a more granular policy for security-sensitive use cases, refer to the documentation [Identity-based IAM policies for Lambda](https://docs.aws.amazon.com/lambda/latest/dg/access-control-identity-based.html) in the to AWS Lambda Developer Guide to learn how to create an IAM policy that fits your use case. **Policies to Use the Lambda Console** If you want to grant an IAM entity permission to use the Lambda console, see [Using the Lambda console](https://docs.aws.amazon.com/lambda/latest/dg/security_iam_id-based-policy-examples.html#security_iam_id-based-policy-examples-console) in the AWS Lambda Developer Guide. Additionally, if you want the user to be able to access and deploy the Ground Truth starter pre-annotation and post-annotation functions using the AWS Serverless Application Repository in the Lambda console, you must specify the *``* where you want to deploy the functions (this should be the same AWS Region used to create the labeling job), and add the following policy to the IAM role. ------ #### [ JSON ] **** ``` { "Version":"2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "serverlessrepo:ListApplicationVersions", "serverlessrepo:GetApplication", "serverlessrepo:CreateCloudFormationTemplate" ], "Resource": "arn:aws:serverlessrepo:us-east-1:838997950401:applications/aws-sagemaker-ground-truth-recipe" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": "serverlessrepo:SearchApplications", "Resource": "*" } ] } ``` ------ **Policies to See Lambda Functions in the Ground Truth Console** To grant an IAM entity permission to view Lambda functions in the Ground Truth console when the user is creating a custom labeling job, the entity must have the permissions described in [Grant IAM Permission to Use the Amazon SageMaker Ground Truth Console](sms-security-permission-console-access.md), including the permissions described in the section [Custom Labeling Workflow Permissions](sms-security-permission-console-access.md#sms-security-permissions-custom-workflow). ## Grant IAM Execution Role Permission to Invoke AWS Lambda Functions If you add the IAM managed policy [AmazonSageMakerGroundTruthExecution](https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerGroundTruthExecution) to the IAM execution role used to create the labeling job, this role has permission to list and invoke Lambda functions with one of the following strings in the function name: `GtRecipe`, `SageMaker`, `Sagemaker`, `sagemaker`, or `LabelingFunction`. If the pre-annotation or post-annotation Lambda function names do not include one of the terms in the preceding paragraph, or if you require more granular permission than those in the `AmazonSageMakerGroundTruthExecution` managed policy, you can add a policy similar to the following to give the execution role permission to invoke pre-annotation and post-annotation functions. ------ #### [ JSON ] **** ``` { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "lambda:InvokeFunction", "Resource": [ "arn:aws:lambda:us-east-1:111122223333:function:", "arn:aws:lambda:us-east-1:111122223333:function:" ] } ] } ``` ------ ## Grant Post-Annotation Lambda Permissions to Access Annotation As described in [Post-annotation Lambda](sms-custom-templates-step3-lambda-requirements.md#sms-custom-templates-step3-postlambda), the post-annotation Lambda request includes the location of the annotation data in Amazon S3. This location is identified by the `s3Uri` string in the `payload` object. To process the annotations as they come in, even for a simple pass through function, you need to assign the necessary permissions to the post-annotation [Lambda execution role](https://docs.aws.amazon.com/lambda/latest/dg/lambda-intro-execution-role.html) to read files from the Amazon S3. There are many ways that you can configure your Lambda to access annotation data in Amazon S3. Two common ways are: + Allow the Lambda execution role to assume the SageMaker AI execution role identified in `roleArn` in the post-annotation Lambda request. This SageMaker AI execution role is the one used to create the labeling job, and has access to the Amazon S3 output bucket where the annotation data is stored. + Grant the Lambda execution role permission to access the Amazon S3 output bucket directly. Use the following sections to learn how to configure these options. **Grant Lambda Permission to Assume SageMaker AI Execution Role** To allow a Lambda function to assume a SageMaker AI execution role, you must attach a policy to the Lambda function's execution role, and modify the trust relationship of the SageMaker AI execution role to allow Lambda to assume it. 1. [Attach the following IAM policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) to your Lambda function's execution role to assume the SageMaker AI execution role identified in `Resource`. Replace `222222222222` with an [AWS account ID](https://docs.aws.amazon.com/general/latest/gr/acct-identifiers.html). Replace `sm-execution-role` with the name of the assumed role. ------ #### [ JSON ] **** ``` { "Version":"2012-10-17", "Statement": { "Effect": "Allow", "Action": "sts:AssumeRole", "Resource": "arn:aws:iam::222222222222:role/sm-execution-role" } } ``` ------ 1. [Modify the trust policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-managingrole_edit-trust-policy) of the SageMaker AI execution role to include the following `Statement`. Replace `222222222222` with an [AWS account ID](https://docs.aws.amazon.com/general/latest/gr/acct-identifiers.html). Replace `my-lambda-execution-role` with the name of the assumed role. ------ #### [ JSON ] **** ``` { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::222222222222:role/my-lambda-execution-role" }, "Action": "sts:AssumeRole" } ] } ``` ------ **Grant Lambda Execution Role Permission to Access S3** You can add a policy similar to the following to the post-annotation Lambda function execution role to give it S3 read permissions. Replace *amzn-s3-demo-bucket* with the name of the output bucket you specify when you create a labeling job. ------ #### [ JSON ] **** ``` { "Version":"2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject" ], "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*" } ] } ``` ------ To add S3 read permissions to a Lambda execution role in the Lambda console, use the following procedure. **Add S3 read permissions to post-annotation Lambda:** 1. Open the [**Functions** page](https://console.aws.amazon.com/lambda/home#/functions) in the Lambda console. 1. Choose the name of the post-annotation function. 1. Choose **Configuration** and then choose **Permissions**. 1. Select the **Role name** and the summary page for that role opens in the IAM console in a new tab. 1. Select **Attach policies**. 1. Do one of the following: + Search for and select **`AmazonS3ReadOnlyAccess`** to give the function permission to read all buckets and objects in the account. + If you require more granular permissions, select **Create policy** and use the policy example in the preceding section to create a policy. Note that you must navigate back to the execution role summary page after you create the policy. 1. If you used the `AmazonS3ReadOnlyAccess` managed policy, select **Attach policy**. If you created a new policy, navigate back to the Lambda execution role summary page and attach the policy you just created. # Create Lambda functions using Ground Truth templates You can create a Lambda function using the Lambda console, the AWS CLI, or an AWS SDK in a supported programming language of your choice. Use the AWS Lambda Developer Guide to learn more about each of these options: + To learn how to create a Lambda function using the console, see [Create a Lambda function with the console](https://docs.aws.amazon.com/lambda/latest/dg/getting-started-create-function.html). + To learn how to create a Lambda function using the AWS CLI, see [Using AWS Lambda with the AWS Command Line Interface](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-awscli.html). + Select the relevant section in the table of contents to learn more about working with Lambda in the language of your choice. For example, select [Working with Python](https://docs.aws.amazon.com/lambda/latest/dg/lambda-python.html) to learn more about using Lambda with the AWS SDK for Python (Boto3). Ground Truth provides pre-annotation and post-annotation templates through an AWS Serverless Application Repository (SAR) *recipe*. Use the following procedure to select the Ground Truth recipe in the Lambda console. **Use the Ground Truth SAR recipe to create pre-annotation and post-annotation Lambda functions:** 1. Open the [**Functions** page](https://console.aws.amazon.com/lambda/home#/functions) on the Lambda console. 1. Select **Create function**. 1. Select **Browse serverless app repository**. 1. In the search text box, enter **aws-sagemaker-ground-truth-recipe** and select that app. 1. Select **Deploy**. The app may take a couple of minutes to deploy. Once the app deploys, two functions appear in the **Functions** section of the Lambda console: `serverlessrepo-aws-sagema-GtRecipePreHumanTaskFunc-` and `serverlessrepo-aws-sagema-GtRecipeAnnotationConsol-`. 1. Select one of these functions and add your custom logic in the **Code** section. 1. When you are finished making changes, select **Deploy** to deploy them. # Test pre-annotation and post-annotation Lambda functions You can test your pre-annotation and post annotation Lambda functions in the Lambda console. If you are a new user of Lambda, you can learn how to test, or *invoke*, your Lambda functions in the console using the [Create a Lambda function](https://docs.aws.amazon.com/lambda/latest/dg/getting-started-create-function.html#gettingstarted-zip-function) tutorial with the console in the AWS Lambda Developer Guide. You can use the sections on this page to learn how to test the Ground Truth pre-annotation and post-annotation templates provided through an AWS Serverless Application Repository (SAR). **Topics** + [Prerequisites](#sms-custom-templates-step3-lambda-test-pre) + [Test the Pre-annotation Lambda Function](#sms-custom-templates-step3-lambda-test-pre-annotation) + [Test the Post-Annotation Lambda Function](#sms-custom-templates-step3-lambda-test-post-annotation) ## Prerequisites You must do the following to use the tests described on this page. + You need access to the Lambda console, and you need permission to create and invoke Lambda functions. To learn how to set up these permissions, see [Grant Permission to Create and Select an AWS Lambda Function](sms-custom-templates-step3-lambda-permissions.md#sms-custom-templates-step3-postlambda-create-perms). + If you have not deployed the Ground Truth SAR recipe, use the procedure in [Create Lambda functions using Ground Truth templates](sms-custom-templates-step3-lambda-create.md) to do so. + To test the post-annotation Lambda function, you must have a data file in Amazon S3 with sample annotation data. For a simple test, you can copy and paste the following code into a file and save it as `sample-annotations.json` and [upload this file to Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/upload-objects.html). Note the S3 URI of this file—you need this information to configure the post-annotation Lambda test. ``` [{"datasetObjectId":"0","dataObject":{"content":"To train a machine learning model, you need a large, high-quality, labeled dataset. Ground Truth helps you build high-quality training datasets for your machine learning models."},"annotations":[{"workerId":"private.us-west-2.0123456789","annotationData":{"content":"{\"crowd-entity-annotation\":{\"entities\":[{\"endOffset\":8,\"label\":\"verb\",\"startOffset\":3},{\"endOffset\":27,\"label\":\"adjective\",\"startOffset\":11},{\"endOffset\":33,\"label\":\"object\",\"startOffset\":28},{\"endOffset\":51,\"label\":\"adjective\",\"startOffset\":46},{\"endOffset\":65,\"label\":\"adjective\",\"startOffset\":53},{\"endOffset\":74,\"label\":\"adjective\",\"startOffset\":67},{\"endOffset\":82,\"label\":\"adjective\",\"startOffset\":75},{\"endOffset\":102,\"label\":\"verb\",\"startOffset\":97},{\"endOffset\":112,\"label\":\"verb\",\"startOffset\":107},{\"endOffset\":125,\"label\":\"adjective\",\"startOffset\":113},{\"endOffset\":134,\"label\":\"adjective\",\"startOffset\":126},{\"endOffset\":143,\"label\":\"object\",\"startOffset\":135},{\"endOffset\":169,\"label\":\"adjective\",\"startOffset\":153},{\"endOffset\":176,\"label\":\"object\",\"startOffset\":170}]}}"}}]},{"datasetObjectId":"1","dataObject":{"content":"Sift 3 cups of flour into the bowl."},"annotations":[{"workerId":"private.us-west-2.0123456789","annotationData":{"content":"{\"crowd-entity-annotation\":{\"entities\":[{\"endOffset\":4,\"label\":\"verb\",\"startOffset\":0},{\"endOffset\":6,\"label\":\"number\",\"startOffset\":5},{\"endOffset\":20,\"label\":\"object\",\"startOffset\":15},{\"endOffset\":34,\"label\":\"object\",\"startOffset\":30}]}}"}}]},{"datasetObjectId":"2","dataObject":{"content":"Jen purchased 10 shares of the stock on Janurary 1st, 2020."},"annotations":[{"workerId":"private.us-west-2.0123456789","annotationData":{"content":"{\"crowd-entity-annotation\":{\"entities\":[{\"endOffset\":3,\"label\":\"person\",\"startOffset\":0},{\"endOffset\":13,\"label\":\"verb\",\"startOffset\":4},{\"endOffset\":16,\"label\":\"number\",\"startOffset\":14},{\"endOffset\":58,\"label\":\"date\",\"startOffset\":40}]}}"}}]},{"datasetObjectId":"3","dataObject":{"content":"The narrative was interesting, however the character development was weak."},"annotations":[{"workerId":"private.us-west-2.0123456789","annotationData":{"content":"{\"crowd-entity-annotation\":{\"entities\":[{\"endOffset\":29,\"label\":\"adjective\",\"startOffset\":18},{\"endOffset\":73,\"label\":\"adjective\",\"startOffset\":69}]}}"}}]}] ``` + You must use the directions in [Grant Post-Annotation Lambda Permissions to Access Annotation](sms-custom-templates-step3-lambda-permissions.md#sms-custom-templates-step3-postlambda-perms) to give your post-annotation Lambda function's execution role permission to assume the SageMaker AI execution role you use to create the labeling job. The post-annotation Lambda function uses the SageMaker AI execution role to access the annotation data file, `sample-annotations.json`, in S3. ## Test the Pre-annotation Lambda Function Use the following procedure to test the pre-annotation Lambda function created when you deployed the Ground Truth AWS Serverless Application Repository (SAR) recipe. **Test the Ground Truth SAR recipe pre-annotation Lambda function** 1. Open the [**Functions** page](https://console.aws.amazon.com/lambda/home#/functions) in the Lambda console. 1. Select the pre-annotation function that was deployed from the Ground Truth SAR recipe. The name of this function is similar to `serverlessrepo-aws-sagema-GtRecipePreHumanTaskFunc-`. 1. In the **Code source** section, select the arrow next to **Test**. 1. Select **Configure test event**. 1. Keep the **Create new test event** option selected. 1. Under **Event template**, select **SageMaker Ground Truth PreHumanTask**. 1. Give your test an **Event name**. 1. Select **Create**. 1. Select the arrow next to **Test** again and you should see that the test you created is selected, which is indicated with a dot by the event name. If it is not selected, select it. 1. Select **Test** to run the test. After you run the test, you can see the **Execution results**. In the **Function logs**, you should see a response similar to the following: ``` START RequestId: cd117d38-8365-4e1a-bffb-0dcd631a878f Version: $LATEST Received event: { "version": "2018-10-16", "labelingJobArn": "arn:aws:sagemaker:us-east-2:123456789012:labeling-job/example-job", "dataObject": { "source-ref": "s3://sagemakerexample/object_to_annotate.jpg" } } {'taskInput': {'taskObject': 's3://sagemakerexample/object_to_annotate.jpg'}, 'isHumanAnnotationRequired': 'true'} END RequestId: cd117d38-8365-4e1a-bffb-0dcd631a878f REPORT RequestId: cd117d38-8365-4e1a-bffb-0dcd631a878f Duration: 0.42 ms Billed Duration: 1 ms Memory Size: 128 MB Max Memory Used: 43 MB ``` In this response, we can see the Lambda function's output matches the required pre-annotation response syntax: ``` {'taskInput': {'taskObject': 's3://sagemakerexample/object_to_annotate.jpg'}, 'isHumanAnnotationRequired': 'true'} ``` ## Test the Post-Annotation Lambda Function Use the following procedure to test the post-annotation Lambda function created when you deployed the Ground Truth AWS Serverless Application Repository (SAR) recipe. **Test the Ground Truth SAR recipe post-annotation Lambda** 1. Open the [**Functions** page](https://console.aws.amazon.com/lambda/home#/functions) in the Lambda console. 1. Select the post-annotation function that was deployed from the Ground Truth SAR recipe. The name of this function is similar to `serverlessrepo-aws-sagema-GtRecipeAnnotationConsol-`. 1. In the **Code source** section, select the arrow next to **Test**. 1. Select **Configure test event**. 1. Keep the **Create new test event** option selected. 1. Under **Event template**, select **SageMaker Ground Truth AnnotationConsolidation**. 1. Give your test an **Event name**. 1. Modify the template code provided as follows: + Replace the Amazon Resource Name (ARN) in `roleArn` with the ARN of the SageMaker AI execution role you used to create the labeling job. + Replace the S3 URI in `s3Uri` with the URI of the `sample-annotations.json` file you added to Amazon S3. After you make these modifications, your test should look similar to the following: ``` { "version": "2018-10-16", "labelingJobArn": "arn:aws:sagemaker:us-east-2:123456789012:labeling-job/example-job", "labelAttributeName": "example-attribute", "roleArn": "arn:aws:iam::222222222222:role/sm-execution-role", "payload": { "s3Uri": "s3://your-bucket/sample-annotations.json" } } ``` 1. Select **Create**. 1. Select the arrow next to **Test** again and you should see that the test you created is selected, which is indicated with a dot by the event name. If it is not selected, select it. 1. Select the **Test** to run the test. After you run the test, you should see a `-- Consolidated Output --` section in the **Function Logs**, which contains a list of all annotations included in `sample-annotations.json`. # Demo template: Annotation of images with `crowd-bounding-box` When you chose to use a custom template as your task type in the Amazon SageMaker Ground Truth console, you reach the **Custom labeling task panel**. There you can choose from multiple base templates. The templates represent some of the most common tasks and provide a sample to work from as you create your customized labeling task's template. If you are not using the console, or as an additional recourse, see [Amazon SageMaker AI Ground Truth Sample Task UIs ](https://github.com/aws-samples/amazon-sagemaker-ground-truth-task-uis) for a repository of demo templates for a variety of labeling job task types. This demonstration works with the **BoundingBox** template. The demonstration also works with the AWS Lambda functions needed for processing your data before and after the task. In the Github repository above, to find templates that work with AWS Lambda functions, look for `{{ task.input. }}` in the template. **Topics** + [Starter Bounding Box custom template](#sms-custom-templates-step2-demo1-base-template) + [Your own Bounding Box custom template](#sms-custom-templates-step2-demo1-your-own-template) + [Your manifest file](#sms-custom-templates-step2-demo1-manifest) + [Your pre-annotation Lambda function](#sms-custom-templates-step2-demo1-pre-annotation) + [Your post-annotation Lambda function](#sms-custom-templates-step2-demo1-post-annotation) + [The output of your labeling job](#sms-custom-templates-step2-demo1-job-output) ## Starter Bounding Box custom template This is the starter bounding box template that is provided. ```

Use the bounding box tool to draw boxes around the requested target of interest:

Draw a rectangle using your mouse over each instance of the target.
Make sure the box does not cut into the target, leave a 2 - 3 pixel margin
When targets are overlapping, draw a box around each object, include all contiguous parts of the target in the box. Do not include parts that are completely overlapped by another object.
Do not include parts of the target that cannot be seen, even though you think you can interpolate the whole shape of the target.
Avoid shadows, they're not considered as a part of the target.
If the target goes off the screen, label up to the edge of the image.

Use the bounding box tool to draw boxes around the requested target of interest.

``` The custom templates use the [Liquid template language](https://shopify.github.io/liquid/), and each of the items between double curly braces is a variable. The pre-annotation AWS Lambda function should provide an object named `taskInput` and that object's properties can be accessed as `{{ task.input. }}` in your template. ## Your own Bounding Box custom template As an example, assume you have a large collection of animal photos in which you know the kind of animal in an image from a prior image-classification job. Now you want to have a bounding box drawn around it. In the starter sample, there are three variables: `taskObject`, `header`, and `labels`. Each of these would be represented in different parts of the bounding box. + `taskObject` is an HTTP(S) URL or S3 URI for the photo to be annotated. The added `| grant_read_access` is a filter that will convert an S3 URI to an HTTPS URL with short-lived access to that resource. If you're using an HTTP(S) URL, it's not needed. + `header` is the text above the photo to be labeled, something like "Draw a box around the bird in the photo." + `labels` is an array, represented as `['item1', 'item2', ...]`. These are labels that can be assigned by the worker to the different boxes they draw. You can have one or many. Each of the variable names come from the JSON object in the response from your pre-annotation Lambda, The names above are merely suggested, Use whatever variable names make sense to you and will promote code readability among your team. **Only use variables when necessary** If a field will not change, you can remove that variable from the template and replace it with that text, otherwise you have to repeat that text as a value in each object in your manifest or code it into your pre-annotation Lambda function. **Example : Final Customized Bounding Box Template** To keep things simple, this template will have one variable, one label, and very basic instructions. Assuming your manifest has an "animal" property in each data object, that value can be re-used in two parts of the template. ```

Draw a bounding box around the {{ task.input.animal }} in the image. If there is more than one {{ task.input.animal }} per image, draw a bounding box around the largest one.

The box should be tight around the {{ task.input.animal }} with no more than a couple of pixels of buffer around the edges.

If the image does not contain a {{ task.input.animal }}, check the Nothing to label box.

Draw a bounding box around the {{ task.input.animal }} in each image. If there is more than one {{ task.input.animal }} per image, draw a bounding box around the largest one.

``` Note the re-use of `{{ task.input.animal }}` throughout the template. If your manifest had all of the animal names beginning with a capital letter, you could use `{{ task.input.animal | downcase }}`, incorporating one of Liquid's built-in filters in sentences where it needed to be presented lowercase. ## Your manifest file Your manifest file should provide the variable values you're using in your template. You can do some transformation of your manifest data in your pre-annotation Lambda, but if you don't need to, you maintain a lower risk of errors and your Lambda will run faster. Here's a sample manifest file for the template. ``` {"source-ref": "", "animal": "horse"} {"source-ref": "", "animal" : "bird"} {"source-ref": "", "animal" : "dog"} {"source-ref": "", "animal" : "cat"} ``` ## Your pre-annotation Lambda function As part of the job set-up, provide the ARN of an AWS Lambda function that can be called to process your manifest entries and pass them to the template engine. **Naming your Lambda function** The best practice in naming your function is to use one of the following four strings as part of the function name: `SageMaker`, `Sagemaker`, `sagemaker`, or `LabelingFunction`. This applies to both your pre-annotation and post-annotation functions. When you're using the console, if you have AWS Lambda functions that are owned by your account, a drop-down list of functions meeting the naming requirements will be provided to choose one. In this very basic example, you're just passing through the information from the manifest without doing any additional processing on it. This sample pre-annotation function is written for Python 3.7. ``` import json def lambda_handler(event, context): return { "taskInput": event['dataObject'] } ``` The JSON object from your manifest will be provided as a child of the `event` object. The properties inside the `taskInput` object will be available as variables to your template, so simply setting the value of `taskInput` to `event['dataObject']` will pass all the values from your manifest object to your template without having to copy them individually. If you wish to send more values to the template, you can add them to the `taskInput` object. ## Your post-annotation Lambda function As part of the job set-up, provide the ARN of an AWS Lambda function that can be called to process the form data when a worker completes a task. This can be as simple or complex as you want. If you want to do answer consolidation and scoring as it comes in, you can apply the scoring and/or consolidation algorithms of your choice. If you want to store the raw data for offline processing, that is an option. **Provide permissions to your post-annotation Lambda** The annotation data will be in a file designated by the `s3Uri` string in the `payload` object. To process the annotations as they come in, even for a simple pass through function, you need to assign `S3ReadOnly` access to your Lambda so it can read the annotation files. In the Console page for creating your Lambda, scroll to the **Execution role** panel. Select **Create a new role from one or more templates**. Give the role a name. From the **Policy templates** drop-down, choose **Amazon S3 object read-only permissions**. Save the Lambda and the role will be saved and selected. The following sample is in Python 2.7. ``` import json import boto3 from urlparse import urlparse def lambda_handler(event, context): consolidated_labels = [] parsed_url = urlparse(event['payload']['s3Uri']); s3 = boto3.client('s3') textFile = s3.get_object(Bucket = parsed_url.netloc, Key = parsed_url.path[1:]) filecont = textFile['Body'].read() annotations = json.loads(filecont); for dataset in annotations: for annotation in dataset['annotations']: new_annotation = json.loads(annotation['annotationData']['content']) label = { 'datasetObjectId': dataset['datasetObjectId'], 'consolidatedAnnotation' : { 'content': { event['labelAttributeName']: { 'workerId': annotation['workerId'], 'boxesInfo': new_annotation, 'imageSource': dataset['dataObject'] } } } } consolidated_labels.append(label) return consolidated_labels ``` The post-annotation Lambda will often receive batches of task results in the event object. That batch will be the `payload` object the Lambda should iterate through. What you send back will be an object meeting the [API contract](sms-custom-templates-step3.md). ## The output of your labeling job You'll find the output of the job in a folder named after your labeling job in the target S3 bucket you specified. It will be in a sub folder named `manifests`. For a bounding box task, the output you find in the output manifest will look a bit like the demo below. The example has been cleaned up for printing. The actual output will be a single line per record. **Example : JSON in your output manifest** ``` { "source-ref":"", "": { "workerId":"", "imageSource":"", "boxesInfo":"{\"boundingBox\":{\"boundingBoxes\":[{\"height\":878, \"label\":\"bird\", \"left\":208, \"top\":6, \"width\":809}], \"inputImageProperties\":{\"height\":924, \"width\":1280}}}"}, "-metadata": { "type":"groundTruth/custom", "job_name":"", "human-annotated":"yes" }, "animal" : "bird" } ``` Note how the additional `animal` attribute from your original manifest is passed to the output manifest on the same level as the `source-ref` and labeling data. Any properties from your input manifest, whether they were used in your template or not, will be passed to the output manifest. # Demo Template: Labeling Intents with `crowd-classifier` If you choose a custom template, you'll reach the **Custom labeling task panel**. There you can select from multiple starter templates that represent some of the more common tasks. The templates provide a starting point to work from in building your customized labeling task's template. In this demonstration, you work with the **Intent Detection** template, which uses the `crowd-classifier` element, and the AWS Lambda functions needed for processing your data before and after the task. **Topics** + [Starter Intent Detection custom template](#sms-custom-templates-step2-demo2-base-template) + [Your Intent Detection custom template](#sms-custom-templates-step2-demo2-your-template) + [Your pre-annotation Lambda function](#sms-custom-templates-step2-demo2-pre-lambda) + [Your post-annotation Lambda function](#sms-custom-templates-step2-demo2-post-lambda) + [Your labeling job output](#sms-custom-templates-step2-demo2-job-output) ## Starter Intent Detection custom template This is the intent detection template that is provided as a starting point. ```

Select the most relevant intention expressed by the text.

Example: I would like to return a pair of shoes

Intent: Return

Pick the most relevant intention expressed by the text

``` The custom templates use the [Liquid template language](https://shopify.github.io/liquid/), and each of the items between double curly braces is a variable. The pre-annotation AWS Lambda function should provide an object named `taskInput` and that object's properties can be accessed as `{{ task.input. }}` in your template. ## Your Intent Detection custom template In the starter template, there are two variables: the `task.input.labels` property in the `crowd-classifier` element opening tag and the `task.input.utterance` in the `classification-target` region's content. Unless you need to offer different sets of labels with different utterances, avoiding a variable and just using text will save processing time and creates less possibility of error. The template used in this demonstration will remove that variable, but variables and filters like `to_json` are explained in more detail in the [`crowd-bounding-box` demonstration]() article. ### Styling Your Elements Two parts of these custom elements that sometimes get overlooked are the `` and `` regions. Good instructions generate good results. In the elements that include these regions, the `` appear automatically in the "Instructions" pane on the left of the worker's screen. The `` are linked from the "View full instructions" link near the top of that pane. Clicking the link opens a modal pane with more detailed instructions. You can not only use HTML, CSS, and JavaScript in these sections, you are encouraged to if you believe you can provide a strong set of instructions and examples that will help workers complete your tasks with better speed and accuracy. **Example Try out a sample with JSFiddle** [https://jsfiddle.net/MTGT_Fiddle_Manager/bjc0y1vd/35/](https://jsfiddle.net/MTGT_Fiddle_Manager/bjc0y1vd/35/) Try out an [example `` task](https://jsfiddle.net/MTGT_Fiddle_Manager/bjc0y1vd/35/). The example is rendered by JSFiddle, therefore all the template variables are replaced with hard-coded values. Click the "View full instructions" link to see a set of examples with extended CSS styling. You can fork the project to experiment with your own changes to the CSS, adding sample images, or adding extended JavaScript functionality. **Example : Final Customized Intent Detection Template** This uses the [example `` task](https://jsfiddle.net/MTGT_Fiddle_Manager/bjc0y1vd/35/), but with a variable for the ``. If you are trying to keep a consistent CSS design among a series of different labeling jobs, you can include an external stylesheet using a `` element the same way you'd do in any other HTML document. ```

In the statements and questions provided in this exercise, what category of action is the speaker interested in doing?

Example Utterance	Good Choice
When is the Seahawks game on?	eat watch browse
Example Utterance	Bad Choice
When is the Seahawks game on?	buy eat watch

What is the speaker expressing they would like to do next?

``` **Example : Your manifest file** If you are preparing your manifest file manually for a text-classification task like this, have your data formatted in the following manner. ``` {"source": "Roses are red"} {"source": "Violets are Blue"} {"source": "Ground Truth is the best"} {"source": "And so are you"} ``` This differs from the manifest file used for the "[Demo template: Annotation of images with `crowd-bounding-box`](sms-custom-templates-step2-demo1.md)" demonstration in that `source-ref` was used as the property name instead of `source`. The use of `source-ref` designates S3 URIs for images or other files that must be converted to HTTP. Otherwise, `source` should be used like it is with the text strings above. ## Your pre-annotation Lambda function As part of the job set-up, provide the ARN of an AWS Lambda that can be called to process your manifest entries and pass them to the template engine. This Lambda function is required to have one of the following four strings as part of the function name: `SageMaker`, `Sagemaker`, `sagemaker`, or `LabelingFunction`. This applies to both your pre-annotation and post-annotation Lambdas. When you're using the console, if you have Lambdas that are owned by your account, a drop-down list of functions meeting the naming requirements will be provided to choose one. In this very basic sample, where you have only one variable, it's primarily a pass-through function. Here's a sample pre-labeling Lambda using Python 3.7. ``` import json def lambda_handler(event, context): return { "taskInput": event['dataObject'] } ``` The `dataObject` property of the `event` contains the properties from a data object in your manifest. In this demonstration, which is a simple pass through, you just pass that straight through as the `taskInput` value. If you add properties with those values to the `event['dataObject']` object, they will be available to your HTML template as Liquid variables with the format `{{ task.input. }}`. ## Your post-annotation Lambda function As part of the job set up, provide the ARN of an Lambda function that can be called to process the form data when a worker completes a task. This can be as simple or complex as you want. If you want to do answer-consolidation and scoring as data comes in, you can apply the scoring or consolidation algorithms of your choice. If you want to store the raw data for offline processing, that is an option. **Set permissions for your post-annotation Lambda function** The annotation data will be in a file designated by the `s3Uri` string in the `payload` object. To process the annotations as they come in, even for a simple pass through function, you need to assign `S3ReadOnly` access to your Lambda so it can read the annotation files. In the Console page for creating your Lambda, scroll to the **Execution role** panel. Select **Create a new role from one or more templates**. Give the role a name. From the **Policy templates** drop-down, choose **Amazon S3 object read-only permissions**. Save the Lambda and the role will be saved and selected. The following sample is for Python 3.7. ``` import json import boto3 from urllib.parse import urlparse def lambda_handler(event, context): consolidated_labels = [] parsed_url = urlparse(event['payload']['s3Uri']); s3 = boto3.client('s3') textFile = s3.get_object(Bucket = parsed_url.netloc, Key = parsed_url.path[1:]) filecont = textFile['Body'].read() annotations = json.loads(filecont); for dataset in annotations: for annotation in dataset['annotations']: new_annotation = json.loads(annotation['annotationData']['content']) label = { 'datasetObjectId': dataset['datasetObjectId'], 'consolidatedAnnotation' : { 'content': { event['labelAttributeName']: { 'workerId': annotation['workerId'], 'result': new_annotation, 'labeledContent': dataset['dataObject'] } } } } consolidated_labels.append(label) return consolidated_labels ``` ## Your labeling job output The post-annotation Lambda will often receive batches of task results in the event object. That batch will be the `payload` object the Lambda should iterate through. You'll find the output of the job in a folder named after your labeling job in the target S3 bucket you specified. It will be in a sub folder named `manifests`. For an intent detection task, the output in the output manifest will look a bit like the demo below. The example has been cleaned up and spaced out to be easier for humans to read. The actual output will be more compressed for machine reading. **Example : JSON in your output manifest** ``` [ { "datasetObjectId":"", "consolidatedAnnotation": { "content": { "": { "workerId":"private.us-east-1.XXXXXXXXXXXXXXXXXXXXXX", "result": { "intent": { "label":"" } }, "labeledContent": { "content":"" } } } } }, "datasetObjectId":"", "consolidatedAnnotation": { "content": { "": { "workerId":"private.us-east-1.6UDLPKQZHYWJQSCA4MBJBB7FWE", "result": { "intent": { "label": "" } }, "labeledContent": { "content": "" } } } } }, ... ... ... ] ``` This should help you create and use your own custom template. # Create a custom workflow using the API When you have created your custom UI template (Step 2) and processing Lambda functions (Step 3), you should place the template in an Amazon S3 bucket with a file name format of: `.liquid.html`. Use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html) action to configure your task. You'll use the location of a custom template ([Creating a custom worker task template](sms-custom-templates-step2.md)) stored in a `.liquid.html` file on S3 as the value for the `UiTemplateS3Uri` field in the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UiConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UiConfig.html) object within the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html) object. For the AWS Lambda tasks described in [Processing data in a custom labeling workflow with AWS Lambda](sms-custom-templates-step3.md), the post-annotation task's ARN will be used as the value for the `AnnotationConsolidationLambdaArn` field, and the pre-annotation task will be used as the value for the `PreHumanTaskLambdaArn.`