Automatically archive items to Amazon S3 using DynamoDB TTL
Tabby Ward, Amazon Web Services
Summary
This pattern provides steps to remove older data from an Amazon DynamoDB table and archive it to an Amazon Simple Storage Service (Amazon S3) bucket on Amazon Web Services (AWS) without having to manage a fleet of servers.
This pattern uses Amazon DynamoDB Time to Live (TTL) to automatically delete old items and Amazon DynamoDB Streams to capture the TTL-expired items. It then connects DynamoDB Streams to AWS Lambda, which runs the code without provisioning or managing any servers.
When new items are added to the DynamoDB stream, the Lambda function is initiated and writes the data to an Amazon Data Firehose delivery stream. Firehose provides a simple, fully managed solution to load the data as an archive into Amazon S3.
DynamoDB is often used to store time series data, such as webpage click-stream data or Internet of Things (IoT) data from sensors and connected devices. Rather than deleting less frequently accessed items, many customers want to archive them for auditing purposes. TTL simplifies this archiving by automatically deleting items based on the timestamp attribute.
Items deleted by TTL can be identified in DynamoDB Streams, which captures a time-ordered sequence of item-level modifications and stores the sequence in a log for up to 24 hours. This data can be consumed by a Lambda function and archived in an Amazon S3 bucket to reduce the storage cost. To further reduce the costs, Amazon S3 lifecycle rules can be created to automatically transition the data (as soon as it gets created) to lowest-cost storage classes
Prerequisites and limitations
Prerequisites
- An active AWS account. 
- AWS Command Line Interface (AWS CLI) 1.7 or later, installed and configured on macOS, Linux, or Windows. 
- Python 3.7 - or later. 
- Boto3 - , installed and configured. If Boto3 is not already installed, run the - python -m pip install boto3command to install it.
Architecture
Technology stack
- Amazon DynamoDB 
- Amazon DynamoDB Streams 
- Amazon Data Firehose 
- AWS Lambda 
- Amazon S3 

- Items are deleted by TTL. 
- The DynamoDB stream trigger invokes the Lambda stream processor function. 
- The Lambda function puts records in the Firehose delivery stream in batch format. 
- Data records are archived in the S3 bucket. 
Tools
- AWS CLI – The AWS Command Line Interface (AWS CLI) is a unified tool to manage your AWS services. 
- Amazon DynamoDB – Amazon DynamoDB is a key-value and document database that delivers single-digit millisecond performance at any scale. 
- Amazon DynamoDB Time to Live (TTL) – Amazon DynamoDB TTL helps you define a per-item timestamp to determine when an item is no longer required. 
- Amazon DynamoDB Streams – Amazon DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours. 
- Amazon Data Firehose – Amazon Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores, and analytics services. 
- AWS Lambda – AWS Lambda runs code without the need to provision or manage servers. You pay only for the compute time you consume. 
- Amazon S3 – Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance. 
Code
The code for this pattern is available in the GitHub Archive items to S3 using DynamoDB TTL
Epics
| Task | Description | Skills required | 
|---|---|---|
| Create a DynamoDB table. | Use the AWS CLI to create a table in DynamoDB called  
 
 | Cloud architect, App developer | 
| Turn on DynamoDB TTL. | Use the AWS CLI to turn on DynamoDB TTL for the  
 | Cloud architect, App developer | 
| Turn on a DynamoDB stream. | Use the AWS CLI to turn on a DynamoDB stream for the  
 This stream will contain records for new items, updated items, deleted items, and items that are deleted by TTL. The records for items that are deleted by TTL contain an additional metadata attribute to distinguish them from items that were deleted manually. The  In this pattern, only the items deleted by TTL are archived, but you could archive only the records where  | Cloud architect, App developer | 
| Task | Description | Skills required | 
|---|---|---|
| Create an S3 bucket. | Use the AWS CLI to create a destination S3 bucket in your AWS Region, replacing  
 Make sure that your S3 bucket's name is globally unique, because the namespace is shared by all AWS accounts. | Cloud architect, App developer | 
| Create a 30-day lifecycle policy for the S3 bucket. | 
 | Cloud architect, App developer | 
| Task | Description | Skills required | 
|---|---|---|
| Create and configure a Firehose delivery stream. | Download and edit the  This code is written in Python and shows you how to create a Firehose delivery stream and an AWS Identity and Access Management (IAM) role. The IAM role will have a policy that can be used by Firehose to write to the destination S3 bucket. To run the script, use the following command and command line arguments. Argument 1=  Argument 2= Your Firehose name (This pilot is using   Argument 3= Your IAM role name (This pilot is using  
 If the specified IAM role does not exist, the script will create an assume role with a trusted relationship policy, as well as a policy that grants sufficient Amazon S3 permission. For examples of these policies, see the Additional information section. | Cloud architect, App developer | 
| Verify the Firehose delivery stream. | Describe the Firehose delivery stream by using the AWS CLI to verify that the delivery stream was successfully created. 
 | Cloud architect, App developer | 
| Task | Description | Skills required | 
|---|---|---|
| Create a trust policy for the Lambda function. | Create a trust policy file with the following information. 
 This gives your function permission to access AWS resources. | Cloud architect, App developer | 
| Create an execution role for the Lambda function. | To create the execution role, run the following code. 
 | Cloud architect, App developer | 
| Add permission to the role. | To add permission to the role, use the  
 | Cloud architect, App developer | 
| Create a Lambda function. | Compress the  
 When you create the Lambda function, you will need the Lambda execution role ARN. To get the ARN, run the following code. 
 To create the Lambda function, run the following code. 
 | Cloud architect, App developer | 
| Configure the Lambda function trigger. | Use the AWS CLI to configure the trigger (DynamoDB Streams), which invokes the Lambda function. The batch size of 400 is to avoid running into Lambda concurrency issues. 
 | Cloud architect, App developer | 
| Task | Description | Skills required | 
|---|---|---|
| Add items with expired timestamps to the Reservation table. | To test the functionality, add items with expired epoch timestamps  to the  The Lambda function is initiated upon DynamoDB Stream activities, and it filters the event to identify  The Firehose delivery stream transfers items to a destination S3 bucket with the  ImportantTo optimize data retrieval, configure Amazon S3 with the  | Cloud architect | 
| Task | Description | Skills required | 
|---|---|---|
| Delete all resources. | Delete all the resources to ensure that you aren't charged for any services that you aren't using. | Cloud architect, App developer | 
Related resources
Additional information
Create and configure a Firehose delivery stream – Policy examples
Firehose trusted relationship policy example document
firehose_assume_role = { 'Version': '2012-10-17', 'Statement': [ { 'Sid': '', 'Effect': 'Allow', 'Principal': { 'Service': 'firehose.amazonaws.com' }, 'Action': 'sts:AssumeRole' } ] }
S3 permissions policy example
s3_access = { "Version": "2012-10-17", "Statement": [ { "Sid": "", "Effect": "Allow", "Action": [ "s3:AbortMultipartUpload", "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:PutObject" ], "Resource": [ "{your s3_bucket ARN}/*", "{Your s3 bucket ARN}" ] } ] }
Test the functionality – Amazon S3 configuration
The Amazon S3 configuration with the following Prefix and ErrorOutputPrefix is chosen to optimize data retrieval. 
Prefix
firehosetos3example/year=! {timestamp: yyyy}/month=! {timestamp:MM}/day=! {timestamp:dd}/hour=!{timestamp:HH}/
Firehose first creates a base folder called firehosetos3example directly under the S3 bucket. It then evaluates the expressions !{timestamp:yyyy}, !{timestamp:MM}, !{timestamp:dd}, and !{timestamp:HH} to year, month, day, and hour using the Java DateTimeFormatter
For example, an approximate arrival timestamp of 1604683577 in Unix epoch time evaluates to year=2020, month=11, day=06, and hour=05. Therefore, the location in Amazon S3, where data records are delivered, evaluates to firehosetos3example/year=2020/month=11/day=06/hour=05/.
ErrorOutputPrefix
firehosetos3erroroutputbase/!{firehose:random-string}/!{firehose:error-output-type}/!{timestamp:yyyy/MM/dd}/
The ErrorOutputPrefix results in a base folder called firehosetos3erroroutputbase directly under the S3 bucket. The expression !{firehose:random-string} evaluates to an 11-character random string such as ztWxkdg3Thg. The location for an Amazon S3 object where failed records are delivered could evaluate to firehosetos3erroroutputbase/ztWxkdg3Thg/processing-failed/2020/11/06/.