This is the new *CloudFormation Template Reference Guide*. Please update your bookmarks and links. For help getting started with CloudFormation, see the [AWS CloudFormation User Guide](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html). # AWS::Glue::Crawler The `AWS::Glue::Crawler` resource specifies an AWS Glue crawler. For more information, see [Cataloging Tables with a Crawler](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html) and [Crawler Structure](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-crawling.html#aws-glue-api-crawler-crawling-Crawler) in the *AWS Glue Developer Guide*. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "Type" : "AWS::Glue::Crawler", "Properties" : { "[Classifiers](#cfn-glue-crawler-classifiers)" : [ String, ... ], "[Configuration](#cfn-glue-crawler-configuration)" : String, "[CrawlerSecurityConfiguration](#cfn-glue-crawler-crawlersecurityconfiguration)" : String, "[DatabaseName](#cfn-glue-crawler-databasename)" : String, "[Description](#cfn-glue-crawler-description)" : String, "[LakeFormationConfiguration](#cfn-glue-crawler-lakeformationconfiguration)" : LakeFormationConfiguration, "[Name](#cfn-glue-crawler-name)" : String, "[RecrawlPolicy](#cfn-glue-crawler-recrawlpolicy)" : RecrawlPolicy, "[Role](#cfn-glue-crawler-role)" : String, "[Schedule](#cfn-glue-crawler-schedule)" : Schedule, "[SchemaChangePolicy](#cfn-glue-crawler-schemachangepolicy)" : SchemaChangePolicy, "[TablePrefix](#cfn-glue-crawler-tableprefix)" : String, "[Tags](#cfn-glue-crawler-tags)" : [ [https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-resource-tags.html](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-resource-tags.html), ... ], "[Targets](#cfn-glue-crawler-targets)" : Targets } } ``` ### YAML ``` Type: AWS::Glue::Crawler Properties: [Classifiers](#cfn-glue-crawler-classifiers): - String [Configuration](#cfn-glue-crawler-configuration): String [CrawlerSecurityConfiguration](#cfn-glue-crawler-crawlersecurityconfiguration): String [DatabaseName](#cfn-glue-crawler-databasename): String [Description](#cfn-glue-crawler-description): String [LakeFormationConfiguration](#cfn-glue-crawler-lakeformationconfiguration): LakeFormationConfiguration [Name](#cfn-glue-crawler-name): String [RecrawlPolicy](#cfn-glue-crawler-recrawlpolicy): RecrawlPolicy [Role](#cfn-glue-crawler-role): String [Schedule](#cfn-glue-crawler-schedule): Schedule [SchemaChangePolicy](#cfn-glue-crawler-schemachangepolicy): SchemaChangePolicy [TablePrefix](#cfn-glue-crawler-tableprefix): String [Tags](#cfn-glue-crawler-tags): - [https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-resource-tags.html](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-resource-tags.html) [Targets](#cfn-glue-crawler-targets): Targets ``` ## Properties `Classifiers` A list of UTF-8 strings that specify the names of custom classifiers that are associated with the crawler. *Required*: No *Type*: Array of String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Configuration` Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see [Configuring a Crawler](https://docs.aws.amazon.com/glue/latest/dg/crawler-configuration.html). *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `CrawlerSecurityConfiguration` The name of the `SecurityConfiguration` structure to be used by this crawler. *Required*: No *Type*: String *Minimum*: `0` *Maximum*: `128` *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `DatabaseName` The name of the database in which the crawler's output is stored. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Description` A description of the crawler. *Required*: No *Type*: String *Pattern*: `[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\r\n\t]*` *Minimum*: `0` *Maximum*: `2048` *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `LakeFormationConfiguration` Specifies whether the crawler should use AWS Lake Formation credentials for the crawler instead of the IAM role credentials. *Required*: No *Type*: [LakeFormationConfiguration](aws-properties-glue-crawler-lakeformationconfiguration.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Name` The name of the crawler. *Required*: No *Type*: String *Pattern*: `[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*` *Minimum*: `1` *Maximum*: `255` *Update requires*: [Replacement](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-replacement) `RecrawlPolicy` A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run. *Required*: No *Type*: [RecrawlPolicy](aws-properties-glue-crawler-recrawlpolicy.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Role` The Amazon Resource Name (ARN) of an IAM role that's used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data. *Required*: Yes *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Schedule` For scheduled crawlers, the schedule when the crawler runs. *Required*: No *Type*: [Schedule](aws-properties-glue-crawler-schedule.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `SchemaChangePolicy` The policy that specifies update and delete behaviors for the crawler. The policy tells the crawler what to do in the event that it detects a change in a table that already exists in the customer's database at the time of the crawl. The `SchemaChangePolicy` does not affect whether or how new tables and partitions are added. New tables and partitions are always created regardless of the `SchemaChangePolicy` on a crawler. The SchemaChangePolicy consists of two components, `UpdateBehavior` and `DeleteBehavior`. *Required*: No *Type*: [SchemaChangePolicy](aws-properties-glue-crawler-schemachangepolicy.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `TablePrefix` The prefix added to the names of tables that are created. *Required*: No *Type*: String *Minimum*: `0` *Maximum*: `128` *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Tags` The tags to use with this crawler. *Required*: No *Type*: Array of [https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-resource-tags.html](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-resource-tags.html) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Targets` A collection of targets to crawl. *Required*: Yes *Type*: [Targets](aws-properties-glue-crawler-targets.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) ## Return values ### Ref When you pass the logical ID of this resource to the intrinsic `Ref` function, `Ref` returns the crawler name. For more information about using the `Ref` function, see [https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/intrinsic-function-reference-ref.html](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/intrinsic-function-reference-ref.html). ## Examples **Topics** + [Create a crawler](#aws-resource-glue-crawler--examples--Create_a_crawler) + [Crawler Configuration](#aws-resource-glue-crawler--examples--Crawler_Configuration) ### Create a crawler The following example creates a crawler for an Amazon S3 target. #### JSON ``` { "Description": "AWS Glue crawler test", "Resources": { "MyRole": { "Type": "AWS::IAM::Role", "Properties": { "AssumeRolePolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Principal": { "Service": [ "glue.amazonaws.com" ] }, "Action": [ "sts:AssumeRole" ] } ] }, "Path": "/", "ManagedPolicyArns": ["arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole"], "Policies": [ { "PolicyName": "S3BucketAccessPolicy", "PolicyDocument": { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:PutObject" ], "Resource": { "Fn::Join": [ "", [ { "Fn::GetAtt": ["MyS3Bucket", "Arn"] }, "*" ] ] } } ] } } ] } }, "MyDatabase": { "Type": "AWS::Glue::Database", "Properties": { "CatalogId": { "Ref": "AWS::AccountId" }, "DatabaseInput": { "Name": "dbcrawler", "Description": "TestDatabaseDescription", "LocationUri": "TestLocationUri", "Parameters": { "key1": "value1", "key2": "value2" } } } }, "MyClassifier": { "Type": "AWS::Glue::Classifier", "Properties": { "GrokClassifier": { "Name": "CrawlerClassifier", "Classification": "wikiData", "GrokPattern": "%{NOTSPACE:language} %{NOTSPACE:page_title} %{NUMBER:hits:long} %{NUMBER:retrieved_size:long}" } } }, "MyS3Bucket": { "Type": "AWS::S3::Bucket", "Properties": { "BucketName": "crawlertesttarget", "AccessControl": "BucketOwnerFullControl" } }, "MyCrawler2": { "Type": "AWS::Glue::Crawler", "Properties": { "Name": "testcrawler1", "Role": { "Fn::GetAtt": [ "MyRole", "Arn" ] }, "DatabaseName": { "Ref": "MyDatabase" }, "Classifiers": [ { "Ref": "MyClassifier" } ], "Targets": { "S3Targets": [ { "Path": { "Ref": "MyS3Bucket" } } ] }, "SchemaChangePolicy": { "UpdateBehavior": "UPDATE_IN_DATABASE", "DeleteBehavior": "LOG" }, "Tags": { "key1": "value1" }, "Schedule": { "ScheduleExpression": "cron(0/10 * ? * MON-FRI *)" } } } } } ``` #### YAML ``` Resources: MyRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Principal: Service: - "glue.amazonaws.com" Action: - "sts:AssumeRole" Path: "/" ManagedPolicyArns: ['arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole'] Policies: - PolicyName: "S3BucketAccessPolicy" PolicyDocument: Version: "2012-10-17" Statement: - Effect: "Allow" Action: - "s3:GetObject" - "s3:PutObject" Resource: !Join - '' - - !GetAtt MyS3Bucket.Arn - "*" MyDatabase: Type: AWS::Glue::Database Properties: CatalogId: !Ref AWS::AccountId DatabaseInput: Name: "dbcrawler" Description: "TestDatabaseDescription" LocationUri: "TestLocationUri" Parameters: key1 : "value1" key2 : "value2" MyClassifier: Type: AWS::Glue::Classifier Properties: GrokClassifier: Name: "CrawlerClassifier" Classification: "wikiData" GrokPattern: "%{NOTSPACE:language} %{NOTSPACE:page_title} %{NUMBER:hits:long} %{NUMBER:retrieved_size:long}" MyS3Bucket: Type: AWS::S3::Bucket Properties: BucketName: "crawlertesttarget" AccessControl: "BucketOwnerFullControl" MyCrawler2: Type: AWS::Glue::Crawler Properties: Name: "testcrawler1" Role: !GetAtt MyRole.Arn DatabaseName: !Ref MyDatabase Classifiers: - !Ref MyClassifier Targets: S3Targets: - Path: !Ref MyS3Bucket SchemaChangePolicy: UpdateBehavior: "UPDATE_IN_DATABASE" DeleteBehavior: "LOG" Tags: "Key1": "Value1" Schedule: ScheduleExpression: "cron(0/10 * ? * MON-FRI *)" ``` ### Crawler Configuration The following example specifies a configuration that controls a crawler's behavior. #### JSON ``` { "Type": "AWS::Glue::Crawler", "Properties": { "Role": "role1", "Classifiers": [], "Description": "example classifier", "SchemaChangePolicy": "", "Schedule": "Schedule", "DatabaseName": "test", "Targets": [], "TablePrefix": "test-", "Name": "my-crawler", "Configuration": "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}" } } ``` #### YAML ``` Type: AWS::Glue::Crawler Properties: Role: role1 Classifiers: - '' Description: example classifier SchemaChangePolicy: '' Schedule: Schedule DatabaseName: test Targets: - '' TablePrefix: test- Name: my-crawler Configuration: "{\"Version\":1.0,\"CrawlerOutput\":{\"Partitions\":{\"AddOrUpdateBehavior\":\"InheritFromTable\"},\"Tables\":{\"AddOrUpdateBehavior\":\"MergeNewColumns\"}}}" ``` # AWS::Glue::Crawler CatalogTarget Specifies an AWS Glue Data Catalog target. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[ConnectionName](#cfn-glue-crawler-catalogtarget-connectionname)" : String, "[DatabaseName](#cfn-glue-crawler-catalogtarget-databasename)" : String, "[DlqEventQueueArn](#cfn-glue-crawler-catalogtarget-dlqeventqueuearn)" : String, "[EventQueueArn](#cfn-glue-crawler-catalogtarget-eventqueuearn)" : String, "[Tables](#cfn-glue-crawler-catalogtarget-tables)" : [ String, ... ] } ``` ### YAML ``` [ConnectionName](#cfn-glue-crawler-catalogtarget-connectionname): String [DatabaseName](#cfn-glue-crawler-catalogtarget-databasename): String [DlqEventQueueArn](#cfn-glue-crawler-catalogtarget-dlqeventqueuearn): String [EventQueueArn](#cfn-glue-crawler-catalogtarget-eventqueuearn): String [Tables](#cfn-glue-crawler-catalogtarget-tables): - String ``` ## Properties `ConnectionName` The name of the connection for an Amazon S3-backed Data Catalog table to be a target of the crawl when using a `Catalog` connection type paired with a `NETWORK` Connection type. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `DatabaseName` The name of the database to be synchronized. *Required*: No *Type*: String *Pattern*: `[\u0020-\uD7FF\uE000-\uFFFD\uD800\uDC00-\uDBFF\uDFFF\t]*` *Minimum*: `1` *Maximum*: `255` *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `DlqEventQueueArn` A valid Amazon dead-letter SQS ARN. For example, `arn:aws:sqs:region:account:deadLetterQueue`. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `EventQueueArn` A valid Amazon SQS ARN. For example, `arn:aws:sqs:region:account:sqs`. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Tables` A list of the tables to be synchronized. *Required*: No *Type*: Array of String *Minimum*: `1` *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler DeltaTarget Specifies a Delta data store to crawl one or more Delta tables. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[ConnectionName](#cfn-glue-crawler-deltatarget-connectionname)" : String, "[CreateNativeDeltaTable](#cfn-glue-crawler-deltatarget-createnativedeltatable)" : Boolean, "[DeltaTables](#cfn-glue-crawler-deltatarget-deltatables)" : [ String, ... ], "[WriteManifest](#cfn-glue-crawler-deltatarget-writemanifest)" : Boolean } ``` ### YAML ``` [ConnectionName](#cfn-glue-crawler-deltatarget-connectionname): String [CreateNativeDeltaTable](#cfn-glue-crawler-deltatarget-createnativedeltatable): Boolean [DeltaTables](#cfn-glue-crawler-deltatarget-deltatables): - String [WriteManifest](#cfn-glue-crawler-deltatarget-writemanifest): Boolean ``` ## Properties `ConnectionName` The name of the connection to use to connect to the Delta table target. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `CreateNativeDeltaTable` Specifies whether the crawler will create native tables, to allow integration with query engines that support querying of the Delta transaction log directly. *Required*: No *Type*: Boolean *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `DeltaTables` A list of the Amazon S3 paths to the Delta tables. *Required*: No *Type*: Array of String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `WriteManifest` Specifies whether to write the manifest files to the Delta table path. *Required*: No *Type*: Boolean *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler DynamoDBTarget Specifies an Amazon DynamoDB table to crawl. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[Path](#cfn-glue-crawler-dynamodbtarget-path)" : String, "[ScanAll](#cfn-glue-crawler-dynamodbtarget-scanall)" : Boolean, "[ScanRate](#cfn-glue-crawler-dynamodbtarget-scanrate)" : Number } ``` ### YAML ``` [Path](#cfn-glue-crawler-dynamodbtarget-path): String [ScanAll](#cfn-glue-crawler-dynamodbtarget-scanall): Boolean [ScanRate](#cfn-glue-crawler-dynamodbtarget-scanrate): Number ``` ## Properties `Path` The name of the DynamoDB table to crawl. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `ScanAll` Indicates whether to scan all the records, or to sample rows from the table. Scanning all the records can take a long time when the table is not a high throughput table. A value of `true` means to scan all records, while a value of `false` means to sample the records. If no value is specified, the value defaults to `true`. *Required*: No *Type*: Boolean *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `ScanRate` The percentage of the configured read capacity units to use by the AWS Glue crawler. Read capacity units is a term defined by DynamoDB, and is a numeric value that acts as rate limiter for the number of reads that can be performed on that table per second. The valid values are null or a value between 0.1 to 1.5. A null value is used when user does not provide a value, and defaults to 0.5 of the configured Read Capacity Unit (for provisioned tables), or 0.25 of the max configured Read Capacity Unit (for tables using on-demand mode). *Required*: No *Type*: Number *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler HudiTarget Specifies an Apache Hudi data source. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[ConnectionName](#cfn-glue-crawler-huditarget-connectionname)" : String, "[Exclusions](#cfn-glue-crawler-huditarget-exclusions)" : [ String, ... ], "[MaximumTraversalDepth](#cfn-glue-crawler-huditarget-maximumtraversaldepth)" : Integer, "[Paths](#cfn-glue-crawler-huditarget-paths)" : [ String, ... ] } ``` ### YAML ``` [ConnectionName](#cfn-glue-crawler-huditarget-connectionname): String [Exclusions](#cfn-glue-crawler-huditarget-exclusions): - String [MaximumTraversalDepth](#cfn-glue-crawler-huditarget-maximumtraversaldepth): Integer [Paths](#cfn-glue-crawler-huditarget-paths): - String ``` ## Properties `ConnectionName` The name of the connection to use to connect to the Hudi target. If your Hudi files are stored in buckets that require VPC authorization, you can set their connection properties here. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Exclusions` A list of glob patterns used to exclude from the crawl. For more information, see [Catalog Tables with a Crawler](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html). *Required*: No *Type*: Array of String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `MaximumTraversalDepth` The maximum depth of Amazon S3 paths that the crawler can traverse to discover the Hudi metadata folder in your Amazon S3 path. Used to limit the crawler run time. *Required*: No *Type*: Integer *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Paths` An array of Amazon S3 location strings for Hudi, each indicating the root folder with which the metadata files for a Hudi table resides. The Hudi folder may be located in a child folder of the root folder. The crawler will scan all folders underneath a path for a Hudi folder. *Required*: No *Type*: Array of String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler IcebergTarget Specifies Apache Iceberg data store targets. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[ConnectionName](#cfn-glue-crawler-icebergtarget-connectionname)" : String, "[Exclusions](#cfn-glue-crawler-icebergtarget-exclusions)" : [ String, ... ], "[MaximumTraversalDepth](#cfn-glue-crawler-icebergtarget-maximumtraversaldepth)" : Integer, "[Paths](#cfn-glue-crawler-icebergtarget-paths)" : [ String, ... ] } ``` ### YAML ``` [ConnectionName](#cfn-glue-crawler-icebergtarget-connectionname): String [Exclusions](#cfn-glue-crawler-icebergtarget-exclusions): - String [MaximumTraversalDepth](#cfn-glue-crawler-icebergtarget-maximumtraversaldepth): Integer [Paths](#cfn-glue-crawler-icebergtarget-paths): - String ``` ## Properties `ConnectionName` The name of the connection to use to connect to the Iceberg target. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Exclusions` A list of global patterns used to exclude from the crawl. *Required*: No *Type*: Array of String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `MaximumTraversalDepth` The maximum depth of Amazon S3 paths that the crawler can traverse to discover the Iceberg metadata folder in your Amazon S3 path. Used to limit the crawler run time. *Required*: No *Type*: Integer *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Paths` One or more Amazon S3 paths that contains Iceberg metadata folders as s3://bucket/prefix . *Required*: No *Type*: Array of String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler JdbcTarget Specifies a JDBC data store to crawl. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[ConnectionName](#cfn-glue-crawler-jdbctarget-connectionname)" : String, "[EnableAdditionalMetadata](#cfn-glue-crawler-jdbctarget-enableadditionalmetadata)" : [ String, ... ], "[Exclusions](#cfn-glue-crawler-jdbctarget-exclusions)" : [ String, ... ], "[Path](#cfn-glue-crawler-jdbctarget-path)" : String } ``` ### YAML ``` [ConnectionName](#cfn-glue-crawler-jdbctarget-connectionname): String [EnableAdditionalMetadata](#cfn-glue-crawler-jdbctarget-enableadditionalmetadata): - String [Exclusions](#cfn-glue-crawler-jdbctarget-exclusions): - String [Path](#cfn-glue-crawler-jdbctarget-path): String ``` ## Properties `ConnectionName` The name of the connection to use to connect to the JDBC target. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `EnableAdditionalMetadata` Specify a value of `RAWTYPES` or `COMMENTS` to enable additional metadata in table responses. `RAWTYPES` provides the native-level datatype. `COMMENTS` provides comments associated with a column or table in the database. If you do not need additional metadata, keep the field empty. *Required*: No *Type*: Array of String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Exclusions` A list of glob patterns used to exclude from the crawl. For more information, see [Catalog Tables with a Crawler](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html). *Required*: No *Type*: Array of String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Path` The path of the JDBC target. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler LakeFormationConfiguration Specifies AWS Lake Formation configuration settings for the crawler. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[AccountId](#cfn-glue-crawler-lakeformationconfiguration-accountid)" : String, "[UseLakeFormationCredentials](#cfn-glue-crawler-lakeformationconfiguration-uselakeformationcredentials)" : Boolean } ``` ### YAML ``` [AccountId](#cfn-glue-crawler-lakeformationconfiguration-accountid): String [UseLakeFormationCredentials](#cfn-glue-crawler-lakeformationconfiguration-uselakeformationcredentials): Boolean ``` ## Properties `AccountId` Required for cross account crawls. For same account crawls as the target data, this can be left as null. *Required*: No *Type*: String *Minimum*: `0` *Maximum*: `12` *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `UseLakeFormationCredentials` Specifies whether to use AWS Lake Formation credentials for the crawler instead of the IAM role credentials. *Required*: No *Type*: Boolean *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler MongoDBTarget Specifies an Amazon DocumentDB or MongoDB data store to crawl. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[ConnectionName](#cfn-glue-crawler-mongodbtarget-connectionname)" : String, "[Path](#cfn-glue-crawler-mongodbtarget-path)" : String } ``` ### YAML ``` [ConnectionName](#cfn-glue-crawler-mongodbtarget-connectionname): String [Path](#cfn-glue-crawler-mongodbtarget-path): String ``` ## Properties `ConnectionName` The name of the connection to use to connect to the Amazon DocumentDB or MongoDB target. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Path` The path of the Amazon DocumentDB or MongoDB target (database/collection). *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler RecrawlPolicy When crawling an Amazon S3 data source after the first crawl is complete, specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. For more information, see [Incremental Crawls in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/incremental-crawls.html) in the developer guide. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[RecrawlBehavior](#cfn-glue-crawler-recrawlpolicy-recrawlbehavior)" : String } ``` ### YAML ``` [RecrawlBehavior](#cfn-glue-crawler-recrawlpolicy-recrawlbehavior): String ``` ## Properties `RecrawlBehavior` Specifies whether to crawl the entire dataset again or to crawl only folders that were added since the last crawler run. A value of `CRAWL_EVERYTHING` specifies crawling the entire dataset again. A value of `CRAWL_NEW_FOLDERS_ONLY` specifies crawling only folders that were added since the last crawler run. A value of `CRAWL_EVENT_MODE` specifies crawling only the changes identified by Amazon S3 events. *Required*: No *Type*: String *Allowed values*: `CRAWL_EVERYTHING | CRAWL_NEW_FOLDERS_ONLY | CRAWL_EVENT_MODE` *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler S3Target Specifies a data store in Amazon Simple Storage Service (Amazon S3). ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[ConnectionName](#cfn-glue-crawler-s3target-connectionname)" : String, "[DlqEventQueueArn](#cfn-glue-crawler-s3target-dlqeventqueuearn)" : String, "[EventQueueArn](#cfn-glue-crawler-s3target-eventqueuearn)" : String, "[Exclusions](#cfn-glue-crawler-s3target-exclusions)" : [ String, ... ], "[Path](#cfn-glue-crawler-s3target-path)" : String, "[SampleSize](#cfn-glue-crawler-s3target-samplesize)" : Integer } ``` ### YAML ``` [ConnectionName](#cfn-glue-crawler-s3target-connectionname): String [DlqEventQueueArn](#cfn-glue-crawler-s3target-dlqeventqueuearn): String [EventQueueArn](#cfn-glue-crawler-s3target-eventqueuearn): String [Exclusions](#cfn-glue-crawler-s3target-exclusions): - String [Path](#cfn-glue-crawler-s3target-path): String [SampleSize](#cfn-glue-crawler-s3target-samplesize): Integer ``` ## Properties `ConnectionName` The name of a connection which allows a job or crawler to access data in Amazon S3 within an Amazon Virtual Private Cloud environment (Amazon VPC). *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `DlqEventQueueArn` A valid Amazon dead-letter SQS ARN. For example, `arn:aws:sqs:region:account:deadLetterQueue`. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `EventQueueArn` A valid Amazon SQS ARN. For example, `arn:aws:sqs:region:account:sqs`. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Exclusions` A list of glob patterns used to exclude from the crawl. For more information, see [Catalog Tables with a Crawler](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html). *Required*: No *Type*: Array of String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `Path` The path to the Amazon S3 target. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `SampleSize` Sets the number of files in each leaf folder to be crawled when crawling sample files in a dataset. If not set, all the files are crawled. A valid value is an integer between 1 and 249. *Required*: No *Type*: Integer *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler Schedule A scheduling object using a `cron` statement to schedule an event. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[ScheduleExpression](#cfn-glue-crawler-schedule-scheduleexpression)" : String } ``` ### YAML ``` [ScheduleExpression](#cfn-glue-crawler-schedule-scheduleexpression): String ``` ## Properties `ScheduleExpression` A `cron` expression used to specify the schedule. For more information, see [Time-Based Schedules for Jobs and Crawlers](https://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html). For example, to run something every day at 12:15 UTC, specify `cron(15 12 * * ? *)`. *Required*: No *Type*: String *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler SchemaChangePolicy The policy that specifies update and delete behaviors for the crawler. The policy tells the crawler what to do in the event that it detects a change in a table that already exists in the customer's database at the time of the crawl. The `SchemaChangePolicy` does not affect whether or how new tables and partitions are added. New tables and partitions are always created regardless of the `SchemaChangePolicy` on a crawler. The SchemaChangePolicy consists of two components, `UpdateBehavior` and `DeleteBehavior`. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[DeleteBehavior](#cfn-glue-crawler-schemachangepolicy-deletebehavior)" : String, "[UpdateBehavior](#cfn-glue-crawler-schemachangepolicy-updatebehavior)" : String } ``` ### YAML ``` [DeleteBehavior](#cfn-glue-crawler-schemachangepolicy-deletebehavior): String [UpdateBehavior](#cfn-glue-crawler-schemachangepolicy-updatebehavior): String ``` ## Properties `DeleteBehavior` The deletion behavior when the crawler finds a deleted object. A value of `LOG` specifies that if a table or partition is found to no longer exist, do not delete it, only log that it was found to no longer exist. A value of `DELETE_FROM_DATABASE` specifies that if a table or partition is found to have been removed, delete it from the database. A value of `DEPRECATE_IN_DATABASE` specifies that if a table has been found to no longer exist, to add a property to the table that says "DEPRECATED" and includes a timestamp with the time of deprecation. *Required*: No *Type*: String *Allowed values*: `LOG | DELETE_FROM_DATABASE | DEPRECATE_IN_DATABASE` *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `UpdateBehavior` The update behavior when the crawler finds a changed schema. A value of `LOG` specifies that if a table or a partition already exists, and a change is detected, do not update it, only log that a change was detected. Add new tables and new partitions (including on existing tables). A value of `UPDATE_IN_DATABASE` specifies that if a table or partition already exists, and a change is detected, update it. Add new tables and partitions. *Required*: No *Type*: String *Allowed values*: `LOG | UPDATE_IN_DATABASE` *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) # AWS::Glue::Crawler Targets Specifies data stores to crawl. ## Syntax To declare this entity in your CloudFormation template, use the following syntax: ### JSON ``` { "[CatalogTargets](#cfn-glue-crawler-targets-catalogtargets)" : [ CatalogTarget, ... ], "[DeltaTargets](#cfn-glue-crawler-targets-deltatargets)" : [ DeltaTarget, ... ], "[DynamoDBTargets](#cfn-glue-crawler-targets-dynamodbtargets)" : [ DynamoDBTarget, ... ], "[HudiTargets](#cfn-glue-crawler-targets-huditargets)" : [ HudiTarget, ... ], "[IcebergTargets](#cfn-glue-crawler-targets-icebergtargets)" : [ IcebergTarget, ... ], "[JdbcTargets](#cfn-glue-crawler-targets-jdbctargets)" : [ JdbcTarget, ... ], "[MongoDBTargets](#cfn-glue-crawler-targets-mongodbtargets)" : [ MongoDBTarget, ... ], "[S3Targets](#cfn-glue-crawler-targets-s3targets)" : [ S3Target, ... ] } ``` ### YAML ``` [CatalogTargets](#cfn-glue-crawler-targets-catalogtargets): - CatalogTarget [DeltaTargets](#cfn-glue-crawler-targets-deltatargets): - DeltaTarget [DynamoDBTargets](#cfn-glue-crawler-targets-dynamodbtargets): - DynamoDBTarget [HudiTargets](#cfn-glue-crawler-targets-huditargets): - HudiTarget [IcebergTargets](#cfn-glue-crawler-targets-icebergtargets): - IcebergTarget [JdbcTargets](#cfn-glue-crawler-targets-jdbctargets): - JdbcTarget [MongoDBTargets](#cfn-glue-crawler-targets-mongodbtargets): - MongoDBTarget [S3Targets](#cfn-glue-crawler-targets-s3targets): - S3Target ``` ## Properties `CatalogTargets` Specifies AWS Glue Data Catalog targets. *Required*: No *Type*: Array of [CatalogTarget](aws-properties-glue-crawler-catalogtarget.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `DeltaTargets` Specifies an array of Delta data store targets. *Required*: No *Type*: Array of [DeltaTarget](aws-properties-glue-crawler-deltatarget.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `DynamoDBTargets` Specifies Amazon DynamoDB targets. *Required*: No *Type*: Array of [DynamoDBTarget](aws-properties-glue-crawler-dynamodbtarget.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `HudiTargets` Property description not available. *Required*: No *Type*: Array of [HudiTarget](aws-properties-glue-crawler-huditarget.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `IcebergTargets` Specifies Apache Iceberg data store targets. *Required*: No *Type*: Array of [IcebergTarget](aws-properties-glue-crawler-icebergtarget.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `JdbcTargets` Specifies JDBC targets. *Required*: No *Type*: Array of [JdbcTarget](aws-properties-glue-crawler-jdbctarget.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `MongoDBTargets` A list of Mongo DB targets. *Required*: No *Type*: Array of [MongoDBTarget](aws-properties-glue-crawler-mongodbtarget.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt) `S3Targets` Specifies Amazon Simple Storage Service (Amazon S3) targets. *Required*: No *Type*: Array of [S3Target](aws-properties-glue-crawler-s3target.md) *Update requires*: [No interruption](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-update-behaviors.html#update-no-interrupt)