Syntax Properties Return values Examples

AWS::DataBrew::Job

Specifies a new DataBrew job.

Syntax

To declare this entity in your CloudFormation template, use the following syntax:

JSON


{
  "Type" : "AWS::DataBrew::Job",
  "Properties" : {
      "DatabaseOutputs" : [ DatabaseOutput, ... ],
      "DataCatalogOutputs" : [ DataCatalogOutput, ... ],
      "DatasetName" : String,
      "EncryptionKeyArn" : String,
      "EncryptionMode" : String,
      "JobSample" : JobSample,
      "LogSubscription" : String,
      "MaxCapacity" : Integer,
      "MaxRetries" : Integer,
      "Name" : String,
      "OutputLocation" : OutputLocation,
      "Outputs" : [ Output, ... ],
      "ProfileConfiguration" : ProfileConfiguration,
      "ProjectName" : String,
      "Recipe" : Recipe,
      "RoleArn" : String,
      "Tags" : [ Tag, ... ],
      "Timeout" : Integer,
      "Type" : String,
      "ValidationConfigurations" : [ ValidationConfiguration, ... ]
    }
}

YAML


Type: AWS::DataBrew::Job
Properties:
  DatabaseOutputs: 
    - DatabaseOutput
  DataCatalogOutputs: 
    - DataCatalogOutput
  DatasetName: String
  EncryptionKeyArn: String
  EncryptionMode: String
  JobSample: 
    JobSample
  LogSubscription: String
  MaxCapacity: Integer
  MaxRetries: Integer
  Name: String
  OutputLocation: 
    OutputLocation
  Outputs: 
    - Output
  ProfileConfiguration: 
    ProfileConfiguration
  ProjectName: String
  Recipe: 
    Recipe
  RoleArn: String
  Tags: 
    - Tag
  Timeout: Integer
  Type: String
  ValidationConfigurations: 
    - ValidationConfiguration

Properties

DatabaseOutputs

Represents a list of JDBC database output objects which defines the output destination for a DataBrew recipe job to write into.

Required: No

Type: Array of DatabaseOutput

Minimum: 1

Update requires: No interruption

DataCatalogOutputs

One or more artifacts that represent the AWS Glue Data Catalog output from running the job.

Required: No

Type: Array of DataCatalogOutput

Minimum: 1

Update requires: No interruption

DatasetName

A dataset that the job is to process.

Required: No

Type: String

Minimum: 1

Maximum: 255

Update requires: No interruption

EncryptionKeyArn

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job output. For more information, see Encrypting data written by DataBrew jobs

Required: No

Type: String

Minimum: 20

Maximum: 2048

Update requires: No interruption

EncryptionMode

The encryption mode for the job, which can be one of the following:

SSE-KMS - Server-side encryption with keys managed by AWS KMS.
SSE-S3 - Server-side encryption with keys managed by Amazon S3.

Required: No

Type: String

Allowed values: SSE-KMS | SSE-S3

Update requires: No interruption

JobSample

A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If a JobSample value isn't provided, the default value is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter.

Required: No

Type: JobSample

Update requires: No interruption

LogSubscription

The current status of Amazon CloudWatch logging for the job.

Required: No

Type: String

Allowed values: ENABLE | DISABLE

Update requires: No interruption

MaxCapacity

The maximum number of nodes that can be consumed when the job processes data.

Required: No

Type: Integer

Update requires: No interruption

MaxRetries

The maximum number of times to retry the job after a job run fails.

Required: No

Type: Integer

Minimum: 0

Update requires: No interruption

Name

The unique name of the job.

Required: Yes

Type: String

Minimum: 1

Maximum: 255

Update requires: Replacement

OutputLocation

The location in Amazon S3 where the job writes its output.

Required: No

Type: OutputLocation

Update requires: No interruption

Outputs

One or more artifacts that represent output from running the job.

Required: No

Type: Array of Output

Minimum: 1

Update requires: No interruption

ProfileConfiguration

Configuration for profile jobs. Configuration can be used to select columns, do evaluations, and override default parameters of evaluations. When configuration is undefined, the profile job will apply default settings to all supported columns.

Required: No

Type: ProfileConfiguration

Update requires: No interruption

ProjectName

The name of the project that the job is associated with.

Required: No

Type: String

Minimum: 1

Maximum: 255

Update requires: No interruption

Recipe

A series of data transformation steps that the job runs.

Required: No

Type: Recipe

Update requires: No interruption

RoleArn

The Amazon Resource Name (ARN) of the role to be assumed for this job.

Required: Yes

Type: String

Minimum: 20

Maximum: 2048

Update requires: No interruption

Tags

Metadata tags that have been applied to the job.

Required: No

Type: Array of Tag

Update requires: No interruption

Timeout

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

Required: No

Type: Integer

Minimum: 0

Update requires: No interruption

Type

The job type of the job, which must be one of the following:

PROFILE - A job to analyze a dataset, to determine its size, data types, data distribution, and more.
RECIPE - A job to apply one or more transformations to a dataset.

Required: Yes

Type: String

Allowed values: PROFILE | RECIPE

Update requires: Replacement

ValidationConfigurations

List of validation configurations that are applied to the profile job.

Required: No

Type: Array of ValidationConfiguration

Update requires: No interruption

Return values

Ref

When you pass the logical ID of this resource to the intrinsic Ref function, Ref returns the resource name. For example:

{ "Ref": "myJob" }

For an AWS Glue DataBrew job named myJob, Ref returns the name of the job.

Examples

Creating jobs

The following examples create new DataBrew profile jobs.

YAML



Resources:
  TestDataBrewJob:
    Type: AWS::DataBrew::Job
    Properties:
      Type: PROFILE
      Name: job-name
      DatasetName: dataset-name
      RoleArn: arn:aws:iam::12345678910:role/PassRoleAdmin
      JobSample:
        Mode: 'CUSTOM_ROWS'
        Size: 50000
      OutputLocation:
        Bucket: !Join [ '', ['databrew-cfn-integration-tests-', !Ref 'AWS::Region', '-', !Ref 'AWS::AccountId' ] ]
      Tags: [{Key: key00AtCreate, Value: value001AtCreate}]

JSON



{
    "AWSTemplateFormatVersion": "2010-09-09",
    "Description": "This CloudFormation template specifies a DataBrew Profile Job",
    "Resources": {
        "MyDataBrewProfileJob": {
            "Type": "AWS::DataBrew::Job",
            "Properties": {
                "Type": "PROFILE",
                "Name": "job-test",
                "DatasetName": "dataset-test",
                "RoleArn": "arn:aws:iam::1234567891011:role/PassRoleAdmin",
                "JobSample": {
                    "Mode": "FULL_DATASET"
                },
                "OutputLocation": {
                    "Bucket": "test-output",
                    "Key": "job-output.json"
                },
                "Tags": [
                    {
                        "Key": "key00AtCreate",
                        "Value": "value001AtCreate"
                    }
                ]
            }
        }
    }
}

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Tag

AllowedStatistics