本文属于机器翻译版本。若本译文内容与英语原文存在差异,则一律以英文原文为准。
使用 Amazon Textract 和AWS开发工具包
以下代码示例显示如何使用 Amazon Textract 启动文档中的异步文本检测。
- Python
-
- 适用于 Python (Boto3) 的 SDK
-
启动异步作业以检测文档中的文本。
class TextractWrapper: """Encapsulates Textract functions.""" def __init__(self, textract_client, s3_resource, sqs_resource): """ :param textract_client: A Boto3 Textract client. :param s3_resource: A Boto3 Amazon S3 resource. :param sqs_resource: A Boto3 Amazon SQS resource. """ self.textract_client = textract_client self.s3_resource = s3_resource self.sqs_resource = sqs_resource def start_detection_job( self, bucket_name, document_file_name, sns_topic_arn, sns_role_arn): """ Starts an asynchronous job to detect text elements in an image stored in an Amazon S3 bucket. Textract publishes a notification to the specified Amazon SNS topic when the job completes. The image must be in PNG, JPG, or PDF format. :param bucket_name: The name of the Amazon S3 bucket that contains the image. :param document_file_name: The name of the document image stored in Amazon S3. :param sns_topic_arn: The Amazon Resource Name (ARN) of an Amazon SNS topic where the job completion notification is published. :param sns_role_arn: The ARN of an AWS Identity and Access Management (IAM) role that can be assumed by Textract and grants permission to publish to the Amazon SNS topic. :return: The ID of the job. """ try: response = self.textract_client.start_document_text_detection( DocumentLocation={ 'S3Object': {'Bucket': bucket_name, 'Name': document_file_name}}, NotificationChannel={ 'SNSTopicArn': sns_topic_arn, 'RoleArn': sns_role_arn}) job_id = response['JobId'] logger.info( "Started text detection job %s on %s.", job_id, document_file_name) except ClientError: logger.exception("Couldn't detect text in %s.", document_file_name) raise else: return job_id-
在 GitHub
中查找说明和更多代码。 -
有关 API 详细信息,请参阅StartDocumentTextDetection在AWSSDK for Python (Boto3) 的 API 参考.
-
有关的完整列表AWSSDK 开发人员指南和代码示例,请参阅将 Amazon Textract 与AWS开发工具包. 本主题还包括有关入门的信息以及有关以前 SDK 版本的详细信息。
开始对文档进行异步分析
跨服务示例