Generate Db2 z/OS data insights by using AWS Mainframe Modernization and Amazon Q in QuickSight
Shubham Roy, Roshna Razack, and Santosh Kumar Singh, Amazon Web Services
Summary
If your organization is hosting business-critical data in an IBM Db2 mainframe environment, gaining insights from that data is crucial for driving growth and innovation. By unlocking mainframe data, you can build faster, secure, and scalable business intelligence to accelerate data-driven decision-making, growth, and innovation in the Amazon Web Services (AWS) Cloud.
This pattern presents a solution for generating business insights and creating sharable narratives from mainframe data in IBM Db2 for z/OS tables. Mainframe data changes are streamed to Amazon Managed Streaming for Apache Kafka (Amazon MSK) topic using AWS Mainframe Modernization Data Replication with Precisely. Using Amazon Redshift streaming ingestion, Amazon MSK topic data is stored in Amazon Redshift Serverless data warehouse tables for analytics in Amazon QuickSight.
After the data is available in QuickSight, you can use natural language prompts with Amazon Q in QuickSight to create summaries of the data, ask questions, and generate data stories. You don't have to write SQL queries or learn a business intelligence (BI) tool.
Business context
This pattern presents a solution for mainframe data analytics and data insights use cases. Using the pattern, you build a visual dashboard for your company's data. To demonstrate the solution, this pattern uses a health care company that provides medical, dental, and vision plans to its members in the US. In this example, member demographics and plan information are stored in the IBM Db2 for z/OS data tables. The visual dashboard shows the following:
Member distribution by region
Member distribution by gender
Member distribution by age
Member distribution by plan type
Members who have not completed preventive immunization
For examples of member distribution by region and members who have not completed preventive immunization, see the Additional information section.
After you create the dashboard, you generate a data story that explains the insights from the previous analysis. The data story provides recommendations for increasing the number of members who have completed preventive immunizations.
Prerequisites and limitations
Prerequisites
An active AWS account. This solution was built and tested on Amazon Linux 2 on Amazon Elastic Compute Cloud (Amazon EC2).
An virtual private cloud (VPC) with a subnet that can be accessed by your mainframe system.
A mainframe database with business data. For the example data used to build and test this solution, see the Attachments section.
Change data capture (CDC) enabled on the Db2 z/OS tables. To enable CDC on Db2 z/OS, see the IBM documentation
. Precisely Connect CDC for z/OS installed on the z/OS system that's hosting the source databases. The Precisely Connect CDC for z/OS image is provided as a zip file within the AWS Mainframe Modernization - Data Replication for IBM z/OS
Amazon Machine Image (AMI). To install Precisely Connect CDC for z/OS on the mainframe, see the Precisely installation documentation .
Limitations
Your mainframe Db2 data should be in a data type that's supported by Precisely Connect CDC. For a list of supported data types, see the Precisely Connect CDC documentation
. Your data at Amazon MSK should be in a data type that's supported by Amazon Redshift. For a list of supported data types, see the Amazon Redshift documentation.
Amazon Redshift has different behaviors and size limits for different data types. For more information, see the Amazon Redshift documentation.
The near real-time data in QuickSight depends on the refresh interval set for the Amazon Redshift database.
Some AWS services aren’t available in all AWS Regions. For Region availability, see AWS services by Region
. Amazon Q in QuickSight is currently not available in every Region that supports QuickSight. For specific endpoints, see the Service endpoints and quotas page, and choose the link for the service.
Product versions
AWS Mainframe Modernization Data Replication with Precisely version 4.1.44
Python version 3.6 or later
Apache Kafka version 3.5.1
Architecture
Target architecture
The following diagram shows an architecture for generating business insights from mainframe data by using AWS Mainframe Modernization Data Replication with Precisely

The diagram shows the following workflow:
The Precisely Log Reader Agent reads data from Db2 logs and writes the data into transient storage on an OMVS file system on the mainframe.
The Publisher Agent reads the raw Db2 logs from transient storage.
The on-premises controller daemon authenticates, authorizes, monitors, and manages operations.
The Apply Agent is deployed on Amazon EC2 by using the preconfigured AMI. It connects with the Publisher Agent through the controller daemon by using TCP/IP. The Apply Agent pushes data to Amazon MSK using multiple workers for high-throughput.
The workers write the data to the Amazon MSK topic in JSON format. As the intermediate target for the replicated messages, Amazon MSK provides the highly available and automated failover capabilities.
Amazon Redshift streaming ingestion provides low-latency, high-speed data ingestion from Amazon MSK to an Amazon Redshift Serverless database. A stored procedure in Amazon Redshift performs the mainframe change data (insert/update/deletes) reconciliation into Amazon Redshift tables. These Amazon Redshift tables serves as the data analytics source for QuickSight.
Users access the data in QuickSight for analytics and insights. You can use Amazon Q in QuickSight to interact with the data by using natural language prompts.
Tools
AWS services
Amazon Elastic Compute Cloud (Amazon EC2) provides scalable computing capacity in the AWS Cloud. You can launch as many virtual servers as you need and quickly scale them out or in.
AWS Key Management Service (AWS KMS) helps you create and control cryptographic keys to help protect your data.
Amazon Managed Streaming for Apache Kafka (Amazon MSK) is a fully managed service that helps you build and run applications that use Apache Kafka to process streaming data.
Amazon QuickSight is a cloud-scale business intelligence (BI) service that helps you visualize, analyze, and report your data in a single dashboard. This pattern uses the generative BI capabilities of Amazon Q in QuickSight.
Amazon Redshift Serverless
is a serverless option of Amazon Redshift that makes it more efficient to run and scale analytics in seconds without the need to set up and manage data warehouse infrastructure. AWS Secrets Manager helps you replace hardcoded credentials in your code, including passwords, with an API call to Secrets Manager to retrieve the secret programmatically.
Other tools
Precisely Connect CDC
collects and integrates data from legacy systems into cloud and data platforms.
Code repository
The code for this pattern is available in the GitHub Mainframe_DataInsights_change_data_reconciliation
Best practices
Follow best practices while setting up your Amazon MSK cluster.
Follow Amazon Redshift data parsing best practices for improving performance.
When you create the AWS Identity and Access Management (IAM) roles for the Precisely setup, follow the principle of least privilege and grant the minimum permissions required to perform a task. For more information, see Grant least privilege and Security best practices in the IAM documentation.
Epics
Task | Description | Skills required |
---|---|---|
Set up a security group. | To connect to the controller daemon and the Amazon MSK cluster, create a security group for the EC2 instance. Add the following inbound and outbound rules:
Note the name of the security group. You will need to reference the name when you launch the EC2 instance and configure the Amazon MSK cluster. | DevOps engineer, AWS DevOps |
Create an IAM policy and an IAM role. |
| DevOps engineer, AWS systems administrator |
Provision an EC2 instance. | To provision an EC2 instance to run Precisely CDC and connect to Amazon MSK, do the following:
| AWS administrator, DevOps engineer |
Task | Description | Skills required |
---|---|---|
Create the Amazon MSK cluster. | To create an Amazon MSK cluster, do the following :
A typical provisioned cluster takes up to 15 minutes to create. After the cluster is created, its status changes from Creating to Active. | AWS DevOps, Cloud administrator |
Set up SASL/SCRAM authentication. | To set up SASL/SCRAM authentication for an Amazon MSK cluster, do the following:
| Cloud architect |
Create the Amazon MSK topic. | To create the Amazon MSK topic, do the following:
| Cloud administrator |
Task | Description | Skills required |
---|---|---|
Set up the Precisely scripts to replicate data changes. | To set up the Precisely Connect CDC scripts to replicate changed data from the mainframe to the Amazon MSK topic, do the following:
For example .ddl files, see the Additional information section. | App developer, Cloud architect |
Generate the network ACL key. | To generate the network access control list (network ACL) key, do the following:
| Cloud architect, AWS DevOps |
Task | Description | Skills required |
---|---|---|
Configure defaults in the ISPF screen. | To configure default settings in the Interactive System Productivity Facility (ISPF), follow the instructions in the Precisely documentation | Mainframe system administrator |
Configure the controller daemon. | To configure the controller daemon, do the following:
| Mainframe system administrator |
Configure the publisher. | To configure the publisher, do the following:
| Mainframe system administrator |
Update the daemon configuration file. | To update the publisher details in the controller daemon configuration file, do the following:
| Mainframe system administrator |
Create the job to start the controller daemon. | To create the job, do the following:
| Mainframe system administrator |
Generate the capture publisher JCL file. | To generation the capture publisher JCL file, do the following:
| Mainframe system administrator |
Check and update CDC. |
| Mainframe system administrator |
Submit the JCL files. | Submit the following JCL files that you configured in the previous steps:
After you submit the JCL files, you can start the Apply Engine in Precisely on the EC2 instance. | Mainframe system administrator |
Task | Description | Skills required |
---|---|---|
Start the Apply Engine and validate the CDC. | To start the Apply Engine on the EC2 instance and validate the CDC, do the following:
| Cloud architect, App developer |
Validate the records on the Amazon MSK topic. | To read the message from the Kafka topic, do the following:
| App developer, Cloud architect |
Task | Description | Skills required |
---|---|---|
Set up Amazon Redshift Serverless. | To create an Amazon Redshift Serverless data warehouse, follow the instructions in the AWS documentation. On the Amazon Redshift Serverless dashboard, validate that the namespace and workgroup were created and are available. For this example pattern, the process might take 2‒5 minutes. | Data engineer |
Set up the IAM role and trust policy required for streaming ingestion. | To set up Amazon Redshift Serverless streaming ingestion from Amazon MSK, do following:
| Data engineer |
Connect Amazon Redshift Serverless to Amazon MSK. | To connect to the Amazon MSK topic, create an external schema in Amazon Redshift Serverless. In Amazon Redshift query editor v2, run the following SQL command, replacing
| Migration engineer |
Create a materialized view. | To consume the data from the Amazon MSK topic in Amazon Redshift Serverless, create a materialized view. In Amazon Redshift query editor v2, run the following SQL commands, replacing
| Migration engineer |
Create target tables in Amazon Redshift. | Amazon Redshift tables provide the input for QuickSight. This pattern uses the tables To create the two tables in Amazon Redshift, run the following SQL commands in Amazon Redshift query editor v2:
| Migration engineer |
Create a stored procedure in Amazon Redshift. | This pattern uses a stored procedure to sync-up change data ( To create the stored procedure in Amazon Redshift, use query editor v2 to run the stored procedure code that's in the GitHub repository. | Migration engineer |
Read from the streaming materialized view and load to the target tables. | The stored procedure reads data change from the streaming materialized view and loads the data changes to the target tables. To run the stored procedure, use the following command:
You can use Amazon EventBridge Another option is to use Amazon Redshift query editor v2 to schedule the refresh. For more information, see Scheduling a query with query editor v2. | Migration engineer |
Task | Description | Skills required |
---|---|---|
Set up QuickSight. | To set up QuickSight, follow the instructions in the AWS documentation. | Migration engineer |
Set up a secure connection between QuickSight and Amazon Redshift. | To set up secure a connection between QuickSight and Amazon Redshift, do the following
| Migration engineer |
Create a dataset for QuickSight. | To create a dataset for QuickSight from Amazon Redshift, do following:
| Migration engineer |
Join the dataset. | To create analytics in QuickSight, join the two tables by following the instructions in the AWS documentation. In the Join Configuration pane, choose Left for Join type. Under Join clauses, use | Migration engineer |
Task | Description | Skills required |
---|---|---|
Set up Amazon Q in QuickSight. | To set up the Amazon Q in QuickSight Generative BI capability, follow the instructions in the AWS documentation. | Migration engineer |
Analyze mainframe data and build a visual dashboard. | To analyze and visualize your data in QuickSight, do the following:
When you're finished, you can publish your dashboard to share with others in your organization. For examples, see Mainframe visual dashboard in the Additional information section. | Migration engineer |
Task | Description | Skills required |
---|---|---|
Create a data story. | Create a data story to explain insights from the previous analysis, and generate a recommendation to increase preventive immunization for members:
| Migration engineer |
View the generated data story. | To view the generated data story, choose that story on the Data stories page. | Migration engineer |
Edit a generated data story. | To change the formatting, layout, or visuals in a data story, follow the instructions in the AWS documentation. | Migration engineer |
Share a data story. | To share a data story, follow the instructions in the AWS documentation. | Migration engineer |
Troubleshooting
Issue | Solution |
---|---|
For QuickSight to Amazon Redshift dataset creation, |
|
Trying to start the Apply engine on the EC2 instance returns the following error:
| Export the
|
Trying to start the Apply Engine returns one of the following connection errors:
| Check the mainframe spool to make sure that the controller daemon jobs are running. |
Related resources
Additional information
Example .ddl files
members_details.ddl
CREATE TABLE MEMBER_DTLS ( memberid INTEGER NOT NULL, member_name VARCHAR(50), member_type VARCHAR(20), age INTEGER, gender CHAR(1), email VARCHAR(100), region VARCHAR(20) );
member_plans.ddl
CREATE TABLE MEMBER_PLANS ( memberid INTEGER NOT NULL, medical_plan CHAR(1), dental_plan CHAR(1), vision_plan CHAR(1), preventive_immunization VARCHAR(20) );
Example .sqd file
Replace <kafka topic name>
with your Amazon MSK topic name.
script.sqd
-- Name: DB2ZTOMSK: DB2z To MSK JOBNAME DB2ZTOMSK;REPORT EVERY 1;OPTIONS CDCOP('I','U','D');-- Source Descriptions JOBNAME DB2ZTOMSK; REPORT EVERY 1; OPTIONS CDCOP('I','U','D'); -- Source Descriptions BEGIN GROUP DB2_SOURCE; DESCRIPTION DB2SQL /var/precisely/di/sqdata/apply/DB2ZTOMSK/ddl/mem_details.ddl AS MEMBER_DTLS; DESCRIPTION DB2SQL /var/precisely/di/sqdata/apply/DB2ZTOMSK/ddl/mem_plans.ddl AS MEMBER_PLANS; END GROUP; -- Source Datastore DATASTORE cdc://<zos_host_name>/DB2ZTOMSK/DB2ZTOMSK OF UTSCDC AS CDCIN DESCRIBED BY GROUP DB2_SOURCE ; -- Target Datastore(s) DATASTORE 'kafka:///<kafka topic name>/key' OF JSON AS TARGET DESCRIBED BY GROUP DB2_SOURCE; PROCESS INTO TARGET SELECT { REPLICATE(TARGET) } FROM CDCIN;
Mainframe visual dashboard
The following data visual was created by Amazon Q in QuickSight for the analysis question show member distribution by region
.

The following data visual was created by Amazon Q in QuickSight for the question show member distribution by Region who have not completed preventive immunization, in pie chart
.

Data story output
The following screenshots show sections of the data story created by Amazon Q in QuickSight for the prompt Build a data story about Region with most numbers of members. Also show the member distribution by age, member distribution by gender. Recommend how to motivate members to complete immunization. Include 4 points of supporting data for this pattern
.
In the introduction, the data story recommends choosing the region with the most members to gain the greatest impact from immunization efforts.

The data story provides an analysis of member numbers for the four regions. The Northeast, Southwest, and Southeast regions have the most members.

The data story presents an analysis of members by age.

The data story focuses on immunization efforts in the Midwest.


Attachments
To access additional content that is associated with this document, unzip the following file: attachment.zip