Amazon S3
Amazon S3 is an object storage service that stores data as objects within buckets. You can connect an Amazon S3 bucket as a data source for your managed knowledge base to ingest the objects you store there.
Supported features
-
Document metadata fields through separate metadata files
-
Inclusion and exclusion content filters using file patterns and S3 key prefixes
-
Incremental content syncs for added, updated, and deleted content
-
Cross-account Amazon S3 bucket access
-
Document-level access control (ACLs), with customer-provided ACL files
Prerequisites
In Amazon S3, make sure you:
-
Note the Amazon S3 bucket name and the AWS account ID of the bucket owner. The bucket must be a General Purpose bucket in the same AWS Region as your knowledge base, and you must have permission to access it.
-
If the bucket is in a different AWS account from the knowledge base, or if it is encrypted with a customer managed KMS key, configure the bucket policy and (if applicable) the KMS key policy to allow access from your knowledge base service role. See Bucket policies for cross-account and KMS-encrypted access.
In your AWS account, make sure you:
-
Include the necessary permissions to connect to your data source in your AWS Identity and Access Management (IAM) role/permissions policy for your knowledge base. For information on the required permissions, see Permissions to access your data sources.
Bucket policies for cross-account and KMS-encrypted access
If your Amazon S3 bucket is in a different AWS account from your knowledge base, or if it is encrypted with a customer managed KMS key, add a resource-based policy to grant your knowledge base service role access. The following examples show the minimum statements required.
Cross-account bucket policy
In the account that owns the Amazon S3 bucket, add the following statement to the
bucket policy. Replace
kb-account-id with the account ID where your
knowledge base is created, kb-service-role with the
name of your knowledge base service role, and
bucket-name with your bucket name.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "AllowBedrockKnowledgeBaseAccess", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::kb-account-id:role/kb-service-role" }, "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::bucket-name", "arn:aws:s3:::bucket-name/*" ] } ] }
For more general guidance, see Configure access to Amazon S3 buckets.
KMS key policy for encrypted buckets
If your Amazon S3 bucket is encrypted with a customer managed KMS key, add the following statement to the key policy. Use the same placeholders as in the cross-account bucket policy. Allow a few minutes for key policy changes to propagate.
{ "Sid": "AllowBedrockKnowledgeBaseKmsAccess", "Effect": "Allow", "Principal": { "AWS": "arn:aws:iam::kb-account-id:role/kb-service-role" }, "Action": [ "kms:Decrypt", "kms:DescribeKey" ], "Resource": "*" }
The "Resource": "*" here scopes to the key the policy is
attached to.
How to set up an Amazon S3 data source
Setting up an Amazon S3 data source involves the following steps:
-
Prepare your bucket. Confirm the bucket name, the account ID, and (for cross-account access) the bucket policy. If you plan to use document metadata, decide on the Amazon S3 prefix where your
.metadata.jsonfiles live. -
Connect the data source. Create the Amazon S3 data source in the knowledge base using the AWS Management Console or the API. See Create the data source.
-
(Optional) Enable document-level access control. Filter query results by user permissions defined in customer-provided ACL files. See Document-level access controls.
Create the data source
Connector parameters
The data source configuration uses the following connector parameters. To connect
to Amazon S3, specify S3 as the connector type in
connectorParameters. For the fields that wrap
connectorParameters (such as
deletionProtectionConfiguration and
mediaExtractionConfiguration), see Connect a data source.
| Field | Required | Description |
|---|---|---|
bucketName |
Yes | The name of the Amazon S3 bucket. |
bucketOwnerAccountId |
Conditional | The AWS account ID of the bucket owner. Required for cross-account access. |
| Field | Required | Description |
|---|---|---|
inclusionPrefixes |
No | List of Amazon S3 key prefixes to include (for example,
documents/). |
exclusionPrefixes |
No | List of Amazon S3 key prefixes to exclude (for example,
archive/). |
inclusionPatterns |
No | List of regular expressions. Only objects whose keys match at least one pattern are ingested. |
exclusionPatterns |
No | List of regular expressions. Objects whose keys match any pattern are not ingested. |
maxFileSizeInMegaBytes |
No | Maximum size, in megabytes, of any single file the connector
ingests. Provide as a numeric string (for example,
"500"). Defaults to "500". |
| Field | Required | Description |
|---|---|---|
metadataFilesPrefix |
No | The Amazon S3 prefix where document metadata files
(.metadata.json) are stored. |
aclEnabled |
No | Set to true to enable document-level access
control. You cannot change this setting after you create the data
source. For details, see Document-level access controls. |
aclConfiguration |
Conditional | Contains globalAccessControlListS3Uri — the
Amazon S3 URI of a JSON file that maps key prefixes to access control
entries. Required when aclEnabled is
true; ignored when aclEnabled is
false. See Document-level access controls. |
Document metadata files
You can attach metadata to each document by uploading a sidecar file alongside
it. For each document, create a file named
in the
same Amazon S3 path. The metadata file must not exceed 10 KB. For example, alongside
filename.extension.metadata.jsonreport.pdf, upload report.pdf.metadata.json with the
following content:
{ "metadataAttributes": { "company": { "value": { "type": "STRING", "stringValue": "BioPharm Innovations" } }, "created_date": { "value": { "type": "NUMBER", "numberValue": 20221205 } }, "author": { "value": { "type": "STRING", "stringValue": "Lisa Thompson" } } } }
For information on the supported attribute data types and the filtering operators you can apply at query time, see Metadata and filtering.