Implement the serverless saga pattern by using AWS Step Functions
Tabby Ward, Joe Kern, and Rohan Mehta, Amazon Web Services
Summary
In a microservices architecture, the main goal is to build decoupled and independent components to promote agility, flexibility, and faster time to market for your applications. As a result of decoupling, each microservice component has its own data persistence layer. In a distributed architecture, business transactions can span multiple microservices. Because these microservices cannot use a single atomicity, consistency, isolation, durability (ACID) transaction, you might end up with partial transactions. In this case, some control logic is needed to undo the transactions that have already been processed. The distributed saga pattern is typically used for this purpose.
The saga pattern is a failure management pattern that helps establish consistency in distributed applications and coordinates transactions between multiple microservices to maintain data consistency. When you use the saga pattern, every service that performs a transaction publishes an event that triggers subsequent services to perform the next transaction in the chain. This continues until the last transaction in the chain is complete. If a business transaction fails, saga orchestrates a series of compensating transactions that undo the changes that were made by the preceding transactions.
This pattern demonstrates how to automate the setup and deployment of a sample application (which handles travel reservations) with serverless technologies such as AWS Step Functions, AWS Lambda, and Amazon DynamoDB. The sample application also uses Amazon API Gateway and Amazon Simple Notification Service (Amazon SNS) to implement a saga execution coordinator. The pattern can be deployed with an infrastructure as code (IaC) framework such as the AWS Cloud Development Kit (AWS CDK), the AWS Serverless Application Model (AWS SAM), or Terraform.
For more information about the saga pattern and other data persistence patterns, see the guide Enabling data persistence in microservices on the AWS Prescriptive Guidance website.
Prerequisites and limitations
Prerequisites
- An active AWS account. 
- Permissions to create an AWS CloudFormation stack. For more information, see Controlling access in the CloudFormation documentation. 
- IaC framework of your choice (AWS CDK, AWS SAM, or Terraform) configured with your AWS account so that you can use the framework CLI to deploy the application. 
- NodeJS, used to build the application and run it locally. 
- A code editor of your choice (such as Visual Studio Code, Sublime, or Atom). 
Product versions
Limitations
Event sourcing is a natural way to implement the saga orchestration pattern in a microservices architecture where all components are loosely coupled and don’t have direct knowledge of one another. If your transaction involves a small number of steps (three to five), the saga pattern might be a great fit. However complexity increases with the number of microservices and the number of steps.
Testing and debugging can become difficult when you’re using this design, because you have to have all services running in order to simulate the transaction pattern.
Architecture
Target architecture
The proposed architecture uses AWS Step Functions to build a saga pattern to book flights, book car rentals, and process payments for a vacation.
The following workflow diagram illustrates the typical flow of the travel reservation system. The workflow consists of reserving air travel ("ReserveFlight"), reserving a car ("ReserveCarRental"), processing payments ("ProcessPayment"), confirming flight reservations ("ConfirmFlight"), and confirming car rentals ("ConfirmCarRental") followed by a success notification when these steps are complete. However, if the system encounters any errors in running any of these transactions, it starts to fail backward. For example, an error with payment processing ("ProcessPayment") triggers a refund ("RefundPayment"), which then triggers a cancellation of the rental car and flight ("CancelRentalReservation" and "CancelFlightReservation"), which ends the entire transaction with a failure message.
This pattern deploys separate Lambda functions for each task that is highlighted in the diagram as well as three DynamoDB tables for flights, car rentals, and payments. Each Lambda function creates, updates, or deletes the rows in the respective DynamoDB tables, depending on whether a transaction is confirmed or rolled back. The pattern uses Amazon SNS to send text (SMS) messages to subscribers, notifying them of failed or successful transactions.

Automation and scale
You can create the configuration for this architecture by using one of the IaC frameworks. Use one of the following links for your preferred IaC.
Tools
AWS services
- AWS Step Functions - is a serverless orchestration service that lets you combine AWS Lambda functions and other AWS services to build business-critical applications. Through the Step Functions graphical console, you see your application’s workflow as a series of event-driven steps. 
- Amazon DynamoDB - is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. You can use DynamoDB to create a database table that can store and retrieve any amount of data, and serve any level of request traffic. 
- AWS Lambda - is a compute service that lets you run code without provisioning or managing servers. Lambda runs your code only when needed and scales automatically, from a few requests per day to thousands per second. 
- Amazon API Gateway - is an AWS service for creating, publishing, maintaining, monitoring, and securing REST, HTTP, and WebSocket APIs at any scale. 
- Amazon Simple Notification Service (Amazon SNS) - is a managed service that provides message delivery from publishers to subscribers. 
- AWS Cloud Development Kit (AWS CDK) - is a software development framework for defining your cloud application resources by using familiar programming languages such as TypeScript, JavaScript, Python, Java, and C#/.Net. 
- AWS Serverless Application Model (AWS SAM) - is an open-source framework for building serverless applications. It provides shorthand syntax to express functions, APIs, databases, and event source mappings. 
Code
The code for a sample application that demonstrates the saga pattern, including the IaC template (AWS CDK, AWS SAM, or Terraform), the Lambda functions, and the DynamoDB tables can be found in the following links. Follow the instructions in the first epic to install these.
Epics
| Task | Description | Skills required | 
|---|---|---|
| Install the NPM packages. | Create a new directory, navigate to that directory in a terminal, and clone the GitHub repository of your choice from the Code section earlier in this pattern. In the root folder that has the  
 | Developer, Cloud architect | 
| Compile scripts. | In the root folder, run the following command to instruct the TypeScript transpiler to create all necessary JavaScript files: 
 | Developer, Cloud architect | 
| Watch for changes and recompile. | In the root folder, run the following command in a separate terminal window to watch for code changes, and compile the code when it detects a change: 
 | Developer, Cloud architect | 
| Run unit tests (AWS CDK only). | If you’re using the AWS CDK, in the root folder, run the following command to perform the Jest unit tests: 
 | Developer, Cloud architect | 
| Task | Description | Skills required | 
|---|---|---|
| Deploy the demo stack to AWS. | ImportantThe application is AWS Region-agnostic. If you use a profile, you must declare the Region explicitly in either the AWS Command Line Interface (AWS CLI) profile or through AWS CLI environment variables. In the root folder, run the following command to create a deployment assembly and to deploy it to the default AWS account and Region. AWS CDK: 
 AWS SAM: 
 Terraform: 
 This step might take several minutes to complete. This command uses the default credentials that were configured for the AWS CLI. Note the API Gateway URL that is displayed on the console after deployment is complete. You will need this information to test the saga execution flow. | Developer, Cloud architect | 
| Compare the deployed stack with the current state. | In the root folder, run the following command to compare the deployed stack with the current state after making changes to the source code: AWS CDK: 
 AWS SAM: 
 Terraform: 
 | Developer, Cloud architect | 
| Task | Description | Skills required | 
|---|---|---|
| Test the saga execution flow. | Navigate to the API Gateway URL that you noted in the earlier step, when you deployed the stack. This URL triggers the state machine to start. For more information about how to manipulate the flow of the state machine by passing different URL parameters, see the Additional information section. To  view the results, sign in to the AWS Management Console and navigate to the Step Functions console. Here, you can see every step of the saga state machine. You can also view the DynamoDB table to see the records inserted, updated, or deleted. If you refresh the screen frequently, you can watch the transaction status change from  You can subscribe to the SNS topic by updating the code in the  | Developer, Cloud architect | 
| Task | Description | Skills required | 
|---|---|---|
| Clean up resources. | To clean up the resources deployed for this application, you can use one of the following commands. AWS CDK: 
 AWS SAM: 
 Terraform: 
 | App developer, Cloud architect | 
Related resources
Technical papers
AWS service documentation
Tutorials
Additional information
Code
For testing purposes, this pattern deploys API Gateway and a test Lambda function that triggers the Step Functions state machine. With Step Functions, you can control the functionality of the travel reservation system by passing a run_type parameter to mimic failures in "ReserveFlight," "ReserveCarRental," "ProcessPayment," "ConfirmFlight," and "ConfirmCarRental."
The saga Lambda function (sagaLambda.ts) takes input from the query parameters in the API Gateway URL, creates the following JSON object, and passes it to Step Functions for execution:
let input = { "trip_id": tripID, // value taken from query parameter, default is AWS request ID "depart_city": "Detroit", "depart_time": "2021-07-07T06:00:00.000Z", "arrive_city": "Frankfurt", "arrive_time": "2021-07-09T08:00:00.000Z", "rental": "BMW", "rental_from": "2021-07-09T00:00:00.000Z", "rental_to": "2021-07-17T00:00:00.000Z", "run_type": runType // value taken from query parameter, default is "success" };
You can experiment with different flows of the Step Functions state machine by passing the following URL parameters:
- Successful Execution ─ https://{api gateway url} 
- Reserve Flight Fail ─ https://{api gateway url}?runType=failFlightsReservation 
- Confirm Flight Fail ─ https://{api gateway url}?runType=failFlightsConfirmation 
- Reserve Car Rental Fail ─ https://{api gateway url}?runType=failCarRentalReservation 
- Confirm Car Rental Fail ─ https://{api gateway url}?runType=failCarRentalConfirmation 
- Process Payment Fail ─ https://{api gateway url}?runType=failPayment 
- Pass a Trip ID ─ https://{api gateway url}?tripID={by default, trip ID will be the AWS request ID} 
IaC templates
The linked repositories include IaC templates that you can use to create the entire sample travel reservation application.
DynamoDB tables
Here are the data models for the flights, car rentals, and payments tables.
Flight Data Model: var params = { TableName: process.env.TABLE_NAME, Item: { 'pk' : {S: event.trip_id}, 'sk' : {S: flightReservationID}, 'trip_id' : {S: event.trip_id}, 'id': {S: flightReservationID}, 'depart_city' : {S: event.depart_city}, 'depart_time': {S: event.depart_time}, 'arrive_city': {S: event.arrive_city}, 'arrive_time': {S: event.arrive_time}, 'transaction_status': {S: 'pending'} } }; Car Rental Data Model: var params = { TableName: process.env.TABLE_NAME, Item: { 'pk' : {S: event.trip_id}, 'sk' : {S: carRentalReservationID}, 'trip_id' : {S: event.trip_id}, 'id': {S: carRentalReservationID}, 'rental': {S: event.rental}, 'rental_from': {S: event.rental_from}, 'rental_to': {S: event.rental_to}, 'transaction_status': {S: 'pending'} } }; Payment Data Model: var params = { TableName: process.env.TABLE_NAME, Item: { 'pk' : {S: event.trip_id}, 'sk' : {S: paymentID}, 'trip_id' : {S: event.trip_id}, 'id': {S: paymentID}, 'amount': {S: "750.00"}, // hard coded for simplicity as implementing any monetary transaction functionality is beyond the scope of this pattern 'currency': {S: "USD"}, 'transaction_status': {S: "confirmed"} } };
Lambda functions
The following functions will be created to support the state machine flow and execution in Step Functions:
- Reserve Flights: Inserts a record into the DynamoDB Flights table with a - transaction_statusof- pending, to book a flight.
- Confirm Flight: Updates the record in the DynamoDB Flights table, to set - transaction_statusto- confirmed, to confirm the flight.
- Cancel Flights Reservation: Deletes the record from the DynamoDB Flights table, to cancel the pending flight. 
- Reserve Car Rentals: Inserts a record into the DynamoDB CarRentals table with a - transaction_statusof- pending, to book a car rental.
- Confirm Car Rentals: Updates the record in the DynamoDB CarRentals table, to set - transaction_statusto- confirmed, to confirm the car rental.
- Cancel Car Rentals Reservation: Deletes the record from the DynamoDB CarRentals table, to cancel the pending car rental. 
- Process Payment: Inserts a record into the DynamoDB Payment table for the payment. 
- Cancel Payment: Deletes the record from the DynamoDB Payments table for the payment. 
Amazon SNS
The sample application creates the following topic and subscription for sending SMS messages and notifying the customer about successful or failed reservations. If you want to receive text messages while testing the sample application, update the SMS subscription with your valid phone number in the state machine definition file.
AWS CDK snippet (add the phone number in the second line of the following code):
const topic = new sns.Topic(this, 'Topic'); topic.addSubscription(new subscriptions.SmsSubscription('+11111111111')); const snsNotificationFailure = new tasks.SnsPublish(this ,'SendingSMSFailure', { topic:topic, integrationPattern: sfn.IntegrationPattern.REQUEST_RESPONSE, message: sfn.TaskInput.fromText('Your Travel Reservation Failed'), }); const snsNotificationSuccess = new tasks.SnsPublish(this ,'SendingSMSSuccess', { topic:topic, integrationPattern: sfn.IntegrationPattern.REQUEST_RESPONSE, message: sfn.TaskInput.fromText('Your Travel Reservation is Successful'), });
AWS SAM snippet (replace the +1111111111 strings with your valid phone number):
StateMachineTopic11111111111: Type: 'AWS::SNS::Subscription' Properties: Protocol: sms TopicArn: Ref: StateMachineTopic Endpoint: '+11111111111' Metadata: 'aws:sam:path': SamServerlessSagaStack/StateMachine/Topic/+11111111111/Resource
Terraform snippet (replace the +111111111 string with your valid phone number):
resource "aws_sns_topic_subscription" "sms-target" { topic_arn = aws_sns_topic.topic.arn protocol = "sms" endpoint = "+11111111111" }
Successful reservations
The following flow illustrates a successful reservation with "ReserveFlight," "ReserveCarRental," and "ProcessPayment" followed by "ConfirmFlight" and "ConfirmCarRental." The customer is notified about the successful booking through SMS messages that are sent to the subscriber of the SNS topic.

Failed reservations
This flow is an example of failure in the saga pattern. If, after booking flights and car rentals, "ProcessPayment" fails, steps are canceled in reverse order. The reservations are released, and the customer is notified of the failure through SMS messages that are sent to the subscriber of the SNS topic.
