aws-lambda-textract
| Reference Documentation: | https://docs.aws.amazon.com/solutions/latest/constructs/ |
| Language | Package |
|---|---|
|
|
|
|
|
|
|
|
|
Overview
This AWS Solutions Construct implements an AWS Lambda function connected to Amazon Textract service. For asynchronous document analysis jobs, the construct can optionally create source and destination S3 buckets with appropriate IAM permissions for the Lambda function to interact with both buckets and Amazon Textract service.
Here is a minimal deployable pattern definition:
Example
Pattern Construct Props
| Name | Type | Description |
|---|---|---|
|
existingLambdaObj? |
Existing instance of Lambda Function object, providing both this and
|
|
|
lambdaFunctionProps? |
Optional - user provided props to override the default props for the Lambda function. Providing both this and |
|
|
asyncJobs? |
|
Whether to enable asynchronous document analysis jobs. When true, source and destination S3 buckets will be created and the Lambda function will be granted permissions to start and get status of document analysis jobs. Default: false |
|
existingSourceBucketObj? |
Existing instance of S3 Bucket object for source documents. If this is provided, then also providing sourceBucketProps causes an error. Only valid when asyncJobs is true. |
|
|
sourceBucketProps? |
Optional user provided props to override the default props for the source S3 Bucket. Only valid when asyncJobs is true. |
|
|
existingDestinationBucketObj? |
Existing instance of S3 Bucket object for analysis results. If this is provided, then also providing destinationBucketProps causes an error. Only valid when asyncJobs is true. |
|
|
destinationBucketProps? |
Optional user provided props to override the default props for the destination S3 Bucket. Only valid when asyncJobs is true. |
|
|
useSameBucket? |
|
Whether to use the same S3 bucket for both source and destination files. When true, only the source bucket will be created and used for both purposes. Only valid when asyncJobs is true. Default: false |
|
createCustomerManagedOutputBucket? |
|
Whether to create a bucket to receive the output of Textract batch jobs. If this is yes, the construct will set up an S3 bucket for output, if this is false, then Textract jobs will send their output to an AWS managed S3 bucket. Default: true |
|
existingVpc? |
An optional, existing VPC into which this pattern should be deployed.
When deployed in a VPC, the Lambda function will use ENIs in the VPC to
access network resources and Interface Endpoints will be created in
the VPC for Amazon Textract. If asyncJobs is true, Interface Endpoints for Amazon S3 will also be created. If an existing VPC is provided, the |
|
|
vpcProps? |
Optional user provided properties to override the default properties
for the new VPC. |
|
|
deployVpc? |
|
Whether to create a new VPC based on |
|
sourceBucketEnvironmentVariableName? |
|
Optional Name for the Lambda function environment variable set to the name of the source bucket. Only valid when asyncJobs is true. Default: SOURCE_BUCKET_NAME |
|
destinationBucketEnvironmentVariableName? |
|
Optional Name for the Lambda function environment variable set to the name of the destination bucket. Only valid when asyncJobs is true. Default: DESTINATION_BUCKET_NAME |
|
dataAccessRoleArnEnvironmentVariableName? |
|
Optional Name for the Lambda function environment variable set to the ARN of the IAM role used for asynchronous document analysis jobs. Only valid when asyncJobs is true. Default: SNS_ROLE_ARN |
|
snsNotificationTopicArnEnvironmentVariableName? |
|
Optional Name for the Lambda function environment variable set to the ARN of the SNS topic used for asynchronous job completion notifications. Only valid when asyncJobs is true. Default: SNS_TOPIC_ARN |
|
existingNotificationTopicObj? |
Optional - existing instance of SNS topic object, providing both this and |
|
|
existingNotificationTopicEncryptionKey? |
If an existing topic is provided in the |
|
|
notificationTopicProps? |
Optional - user provided properties to override the default properties for the SNS topic. Providing both this and |
|
|
enableNotificationTopicEncryptionWithCustomerManagedKey? |
|
If no key is provided, this flag determines whether the SNS Topic is encrypted with a new CMK or an AWS managed key. This flag is ignored if any of the following are defined: notificationTopicProps.masterKey, notificationTopicEncryptionKey or notificationTopicEncryptionKeyProps. Only valid when asyncJobs is true. |
|
notificationTopicEncryptionKey? |
An optional, imported encryption key to encrypt the SNS Topic with. Only valid when asyncJobs is true. |
|
|
notificationTopicEncryptionKeyProps? |
Optional user provided properties to override the default properties for the KMS encryption key used to encrypt the SNS Topic with. Only valid when asyncJobs is true. |
|
|
sourceLoggingBucketProps? |
Optional user provided props to override the default props for the source S3 Logging Bucket. Only valid when asyncJobs is true. |
|
|
destinationLoggingBucketProps? |
Optional user provided props to override the default props for the destination S3 Logging Bucket. Only valid when asyncJobs is true. |
|
|
logSourceS3AccessLogs? |
boolean |
Whether to turn on Access Logging for the source S3 bucket. Creates an S3 bucket with associated storage costs for the logs. Enabling Access Logging is a best practice. Only valid when asyncJobs is true. default - true |
|
logDestinationS3AccessLogs? |
boolean |
Whether to turn on Access Logging for the destination S3 bucket. Creates an S3 bucket with associated storage costs for the logs. Enabling Access Logging is a best practice. Only valid when asyncJobs is true. default - true |
Pattern Properties
| Name | Type | Description |
|---|---|---|
|
lambdaFunction |
Returns an instance of the Lambda function created by the pattern. |
|
|
sourceBucket? |
Returns an instance of the source S3 bucket if it is created by the pattern. |
|
|
destinationBucket? |
Returns an instance of the destination S3 bucket if it is created by the pattern. |
|
|
sourceLoggingBucket? |
Returns an instance of s3.Bucket created by the construct as the logging bucket for the source bucket. |
|
|
destinationLoggingBucket? |
Returns an instance of s3.Bucket created by the construct as the logging bucket for the destination bucket. |
|
|
snsNotificationTopic? |
Returns an instance of the SNS topic created for asynchronous job completion notifications when asyncJobs is true. |
|
|
notificationTopicEncryptionKey? |
Returns an instance of kms.IKey used for the SNS Topic. |
|
|
vpc? |
Returns an interface on the VPC used by the pattern (if any). This may be a VPC created by the pattern or the VPC supplied to the pattern constructor. |
|
|
sourceBucketInterface? |
Returns an interface of s3.IBucket used by the construct for the source bucket whether created by the pattern or supplied from the client. |
|
|
destinationBucketInterface? |
Returns an interface of s3.IBucket used by the construct for the destination bucket whether created by the pattern or supplied from the client. |
Default settings
Out of the box implementation of the Construct without any override will set the following defaults:
AWS Lambda Function
-
Configure limited privilege access IAM role for Lambda function
-
Enable reusing connections with Keep-Alive for NodeJs Lambda function
-
Enable X-Ray Tracing
-
Set Environment Variables
-
(default) SOURCE_BUCKET_NAME (when asyncJobs is true)
-
(default) DESTINATION_BUCKET_NAME (when asyncJobs is true)
-
(default) SNS_ROLE_ARN (when asyncJobs is true)
-
(default) SNS_TOPIC_ARN (when asyncJobs is true)
-
AWS_NODEJS_CONNECTION_REUSE_ENABLED (for Node 10.x and higher functions)
-
-
Grant permissions to use Amazon Textract service (['textract:DetectDocumentText', 'textract:AnalyzeDocument', 'textract:AnalyzeExpense', 'textract:AnalyzeID'] by default)
-
When asyncJobs is true, grant permissions to start and get status of document analysis jobs (textract:StartDocumentAnalysis, textract:StartDocumentTextDetection, textract:StartExpenseAnalysis, textract:GetDocumentAnalysis, textract:GetDocumentTextDetection, textract:GetExpenseAnalysis), read from source bucket, and read and write to destination bucket
Amazon S3 Buckets (when asyncJobs is true)
-
Configure Access logging for both S3 Buckets
-
Enable server-side encryption for both S3 Buckets using AWS managed KMS Key
-
Enforce encryption of data in transit
-
Turn on the versioning for both S3 Buckets
-
Don’t allow public access for both S3 Buckets
-
Retain the S3 Buckets when deleting the CloudFormation stack
-
Applies Lifecycle rule to move noncurrent object versions to Glacier storage after 90 days
Amazon SNS Topic (when asyncJobs is true)
-
Configure server-side encryption using AWS managed KMS Key
-
Create topic for asynchronous job completion notifications
Amazon Textract Service
-
Lambda function will have permissions to call ['textract:DetectDocumentText', 'textract:AnalyzeDocument', 'textract:AnalyzeExpense', 'textract:AnalyzeID'] operations
When asyncJobs is true
-
Lambda function will add permissions to call [ 'textract:StartDocumentTextDetection', 'textract:GetDocumentTextDetection', 'textract:StartDocumentAnalysis', 'textract:GetDocumentAnalysis', 'textract:StartExpenseAnalysis', 'textract:GetExpenseAnalysis', 'textract:StartLendingAnalysis', 'textract:GetLendingAnalysis' ]
-
When asyncJobs is true, an SNS topic will be created and the Lambda function is granted permission to call ['sns:Publish']
Amazon VPC
-
If deployVpc is true, a minimal VPC will be created with:
-
Interface Endpoints for Amazon Textract
-
Interface Endpoints for Amazon S3 (when asyncJobs is true)
-
Interface Endpoints for Amazon SNS (when asyncJobs is true)
-
Private subnets for Lambda function
-
Appropriate security groups and routing
-
Architecture
Default Implementation
Default Implementation when asyncJobs = true
Github
Go to the Github repo
© Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.