# Data Integration and Analytics


This sections provides information about Data Integration and Analytics in relation to RISE with SAP

**Topics**
+ [

# Data integration
](rise-data-integration.md)
+ [

# Data analytics
](rise-data-analytics.md)

# Data integration


RISE with SAP Extensibility for Data Integration with AWS is a technical framework that enables data flow between SAP systems, AWS services, and third-party solutions. This integration architecture provides standardized APIs, connectors, and protocols to establish secure communication channels, addressing the critical need for seamless enterprise data integration in modern cloud environments.

The RISE with SAP Extensibility for Data Integration outlines two primary data handling and integration mechanisms.

**Topics**
+ [

# Data Replication
](rise-data-replication.md)
+ [

# Replicating data using AWS Services
](rise-data-replication-awsmanaged.md)
+ [

# Replicating data using SAP services
](rise-data-replication-sap.md)
+ [

# Replicating data using Partner Solutions
](rise-data-replication-partner.md)
+ [

# Data Federation using AWS Services
](rise-data-federation.md)

# Data Replication


Data Replication from SAP is a crucial step in making the data usable for reporting, analysis, and integration with other systems. Below is the reference architecture on how this can be done in AWS.

![\[Overall Data replication\]](http://docs.aws.amazon.com/sap/latest/general/images/rise-data-replication.png)


# Replicating data using AWS Services


![\[Data replication using Managed Services\]](http://docs.aws.amazon.com/sap/latest/general/images/rise-data-replication-aws-services.png)


 ** AWS Glue** 

 [AWS Glue](https://aws.amazon.com/glue/) is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. With AWS Glue, you can discover and connect to SAP using OData and manage your data in a centralized data catalog. You can visually create, run, and monitor extract, transform, and load (ETL) pipelines to load SAP data into your data lakes and data warehouses.

The [Connecting to SAP OData using Glue](https://docs.aws.amazon.com/glue/latest/dg/connecting-to-data-sap-odata.html) user guide offers comprehensive instructions for setting up Glue ETL jobs, configuring SAP OData connections, and reading data from SAP, including handling incremental transfers.

 [AWS Glue Zero-ETL](https://docs.aws.amazon.com/glue/latest/dg/zero-etl-using.html) is a set of fully managed integrations by AWS that minimizes the need to build ETL data pipelines for common ingestion and replication use cases. It makes data available in Amazon SageMaker Lakehouse and Amazon Redshift from multiple operational, transactional, and application sources. Leveraging the SAP OData Connectors, you can create full data replication jobs from SAP, with fully managed replication (Inserts, updates and deletions) as well as schema evolution.

 AWS Glue and Glue Zero-ETL serve distinct roles in data integration, with each offering unique advantages for different use cases. While AWS Glue excels in complex ETL operations, data discovery, preparation, and extraction, particularly for specialized scenarios like SAP ODP-based replication. AWS Glue Zero-ETL is designed as a more streamlined, no-code solution for fully managed data replication scenarios.

 AWS Glue requires more hands-on management, including code deployment and maintenance, but offers greater flexibility and control over data transformation processes. AWS Glue performance is enhanced by its serverless, scale-out Apache Spark environment, which allows you to allocate Data Processing Units (DPUs) for scalable compute. This allows parallel processing and event-driven execution.

# Replicating data using SAP services


![\[Data replication using SAP Services\]](http://docs.aws.amazon.com/sap/latest/general/images/rise-data-replication-sap-services.png)


 **SAP BDC / Datasphere** 

 [SAP Datasphere](https://www.sap.com/products/data-cloud/datasphere.html) offers various connection types such as SAP ABAP Connections, SAP ECC Connections, SAP S/4HANA Cloud Connections supporting RFC and ODP protocols. Refer to [SAP BDC / Datasphere documentation](https://help.sap.com/docs/SAP_DATASPHERE/be5967d099974c69b77f4549425ca4c0/eb85e157ab654152bd68a8714036e463.html) to choose most appropriate connectivity to replicate SAP data. Using [premium outbound integration for [Amazon Simple Storage Connection (Amazon S3)](https://help.sap.com/docs/SAP_DATASPHERE/be5967d099974c69b77f4549425ca4c0/a7b660a0a4ef4a4fbee57b44f5b2147d.html), configure SAP Datasphere replication flow to ingest data to Amazon S3.

 **SAP Data Services** 

 [SAP Data Services](https://www.sap.com/products/technology-platform/data-services.html) offer various connections to replicate data from SAP ECC data. Refer to [SAP Data Services documentation](https://help.sap.com/docs/SAP_DATA_SERVICES) to choose most appropriate connectivity. SAP Data Services offers [Amazon Redshift Datastore](https://help.sap.com/docs/SAP_DATA_SERVICES/af6d8e979d0f40c49175007e486257f0/731d7026ae3b4fef9ebadfbe23ffff12.html) and [Amazon S3 datastore](https://help.sap.com/docs/SAP_DATA_SERVICES/af6d8e979d0f40c49175007e486257f0/e1ed075446344b5ca098e2382cfca78d.html) to ingest data to AWS. It also offers options for [Amazon S3 file location protocol](https://help.sap.com/docs/SAP_DATA_SERVICES/af6d8e979d0f40c49175007e486257f0/a611106693ea422eb0b04705298516b7.html) such as encryption type, compression type, batch-size, number of threads, Amazon S3 storage class, etc.

# Replicating data using Partner Solutions


 AWS Partner Solutions offer ready to deploy solutions with enhanced features, such as pre-built connectors, specialized data pipelines, and advanced optimization techniques that reduce complexity and improve the speed of deployment.

To find and deploy a solution that fits your specific needs, you can explore the [AWS Partner Solutions Finder](https://partners.amazonaws.com/search/partners) or browse through the [AWS Marketplace](https://aws.amazon.com/marketplace), where you can search for and quickly deploy partner solutions tailored to your unique SAP use case.

 **Further Resources** 

The [Guidance for SAP Data Integration and Management on AWS](https://aws.amazon.com/solutions/guidance/sap-data-integration-and-management-on-aws/) provides the essential data foundation to build data and analytics solutions. It shows how to integrate data from SAP ERP source systems and AWS in real-time or batch mode, with change data capture, using AWS services, SAP products, and AWS Partner Solutions. It includes an overview reference architecture showing how to ingest SAP systems to AWS in addition to five detailed architectural patterns that complement SAP-supported mechanisms (such as OData, ODP, SLT, and BTP) using AWS services that are highlighted above, SAP products, and AWS Partner Solutions.

# Data Federation using AWS Services


Data federation is a data management strategy that enables, real-time analytics, single source-of-trust, no data duplication or expensive pipelines.

When there is a business requirement to have a consolidated data for transactional, analytics, machine learning, it is preferred for the data to be accessed from the source rather than replicated to avoid latency, inconsistency and extra storage cost.

In the context of SAP and AWS services, it allows organizations to access, combine, and analyze data from both SAP systems and AWS cloud services seamlessly.

![\[Data Federation\]](http://docs.aws.amazon.com/sap/latest/general/images/rise-data-federation.png)


 **Amazon Athena** 

 [Amazon Athena](https://aws.amazon.com/athena/) is a serverless, scalable and flexible interactive query service by AWS that allows to analyze data directly in Amazon S3. The data stored in Amazon S3 from multiple sources can be further transformed into tables and views using Amazon Athena and queried to replicate meaningful information in a structured way.

Data in Athena can be accessed from SAP Datasphere through [data federation](https://discovery-center.cloud.sap/missiondetail/3401/3441/) from SAP Datasphere connections. Users can also access SAP Datasphere tables and views from Athena by [querying SAP HANA](https://aws.amazon.com/blogs/big-data/query-sap-hana-using-athena-federated-query-and-join-with-data-in-your-amazon-s3-data-lake/) using an [Athena Federated Query](https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html).

Data can also be federated to the SAP HANA Cloud by configuring Athena as a remote source using the [Smart Data Access – Athena adapter](https://community.sap.com/t5/technology-blogs-by-sap/federating-queries-in-hana-cloud-from-amazon-athena-using-athena-api/ba-p/13476091). The [Athena Federated Query connection](https://aws.amazon.com/blogs/big-data/query-sap-hana-using-athena-federated-query-and-join-with-data-in-your-amazon-s3-data-lake/) can also be used to read data from a stand-alone SAP HANA Cloud environment.

 **Amazon Redshift** 

 [Amazon Redshift](https://aws.amazon.com/redshift/) iis a fully managed, peta-byte scale data warehouse service from AWS. Customers have built their data warehouses and build data models for analytics and reporting.

 [Data federation](https://discovery-center.cloud.sap/missiondetail/3406/3446/) from Amazon Redshift into SAP Datasphere is possible with SAP HANA Smart Data Integration (SDI) or the SAP Data Provisioning Agent. Amazon Redshift data can also be federated through the Athena Federated Query data source connector.

 **Further resources** 

The [Guidance for Data Federation](https://aws.amazon.com/solutions/guidance/data-federation-between-sap-and-aws/) between SAP and AWS outlines the process of federating data between SAP and AWS cloud analytics services, enabling you to establish a data mesh architecture. By federating data between SAP and AWS. you can easily transform and visualize your data in a scalable, secure, and cost-effective way, helping you inform your decision-making.

# Data analytics


SAP customers need business insights in real-time to react to business changes and leverage untapped business opportunities. This needs to be realized with modern, cloud-native solutions to shift from overnight data processing to real-time analytics. Leveraging AWS and SAP solutions, customers can leverage purpose-built analytics services to gain competitive advantage in their respective industries.

Modern data architectures, such as [Data Lakes, Data Warehouses](https://aws.amazon.com/compare/the-difference-between-a-data-warehouse-data-lake-and-data-mart/) and [Lakehouse](https://aws.amazon.com/sagemaker/lakehouse/) provide a combination of patterns and services that enable organisations to handle large volumes of structured and unstructured data for analysis and reporting, providing also a solid foundation for Artificial Intelligence (AI) and Machine Learning (ML) applications, including Generative AI. These architectures provide building blocks that can be implemented independently as well as complimenting each other, based on requirements and preferences.

**Topics**
+ [

# Data Lake Architecture
](rise-data-lake-architecture.md)
+ [

# Data Warehouse Architecture
](rise-data-warehouse-architecture.md)

# Data Lake Architecture


The [Data lake](https://aws.amazon.com/what-is/data-lake/) architecture provides building blocks that demonstrate how to combine and consolidate SAP and non-SAP data from disparate sources using analytics and machine learning services on AWS.

Data lake enables customers to handle structured and unstructured data. It is designed based on a “schema-on-read” approach, meaning data can be stored in raw form and, only applies schema or structure upon consumption (i.e.: to create a Financial Report). The structure is defined when reading the data from the source, defining data types and lengths at that point. Due to this, storage and compute is decoupled, leveraging low cost storage that can scale to petabyte sizes at a fragment of cost compared to traditional databases.

Data lake enables organizations to perform various analytical tasks like creating interactive dashboards, generating visual insights, processing large-scale data, conducting real-time analysis, and implementing machine learning algorithms across diverse data sources.

![\[Data Lake Architecture\]](http://docs.aws.amazon.com/sap/latest/general/images/rise-data-lake-architecture.png)


The Data Lake reference architecture provides three distinct layers to transform raw data into valuable insights:

 **Raw Layer** 

The raw layer is the initial layer in a data lake, built on [Amazon S3](https://aws.amazon.com/s3/), where data arrives in its original format directly from source systems without any transformation. The data in this layer is used to determine changes and data to consolidate in the next layer since it will contain multiple versions of the same data (changes, full loads, etc).

Data extracted from SAP (via [SAP ODP OData](https://help.sap.com/docs/SAP_NETWEAVER_750/825e9222e7ad4fe1988c6cc600bda779/c1c48cd6d78d4afe8ceb6a1ddc481db1.html) or other mechanisms) needs to be prepared for further processing. The extracted data will be packaged in several files (defined by the package or page size in the extraction tool) hence multiple files for a given extraction run can be generated.

 **Enriched Layer** 

The Enriched Layer is built on [Amazon S3](https://aws.amazon.com/s3/) and it contains a true representation of the data in the source SAP system along with logical deletions and is stored [Amazon S3 Tables](https://aws.amazon.com/s3/features/tables/) with built-in [Apache Iceberg format](https://aws.amazon.com/what-is/apache-iceberg/). The Iceberg Table file format allows the creation of [Glue or Athena Tables](https://docs.aws.amazon.com/athena/latest/ug/understanding-tables-databases-and-the-data-catalog.html) within the [Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html), supporting Database type operations such as Insert, Update and Deletion, with the Iceberg file format handling the file operations (deletion of records, etc). Iceberg tables also supports the concept of [Time Travel](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-time-travel-and-version-travel-queries.html), which enables querying data for a specific point in time.

Data from the Raw Layer is inserted or updated in the Enriched layer in the right order based on the table key and persisted in its original format (no transformation or changes). Each records needs to be enriched with certain attributes such as time of extraction and record number, this can be achieved with the [AWS Glue jobs](https://docs.aws.amazon.com/glue/latest/dg/author-glue-job.html).

 **Curated Layer** 

The Curated Layer is the layer where data is stored for data consumption. Records deleted on the source are deleted physically. Any calculations (averages, time between dates, etc) or data manipulation (format changes, lookup from another table) can be stored in this layer, ready to be consumed. Data is updated in this layer using the AWS Glue jobs. Amazon Athena views are created on top of these tables for downstream consumption through Amazon Quick Sight or similar tools.

The [Data Lakes with SAP and Non-SAP Data on AWS Solution Guidance](https://aws.amazon.com/solutions/guidance/data-lakes-with-sap-and-non-sap-data-on-aws/) provides a detailed architecture, steps to implement and accelerators to fast track the implementation of a Data Lake for SAP and non-SAP data. You can refer to the different available options to extract data from SAP to the Data Lake in the prior Data Integration section.

# Data Warehouse Architecture


A [Data Warehouse](https://aws.amazon.com/what-is/data-warehouse/) is a centralized repository based on “schema-on-write” approach that aggregates structured, historical data from multiple sources (both SAP and non-SAP) to enable advanced analytics, reporting, and business intelligence (BI). It enables organizations to analyze vast amounts of integrated data for informed decision-making, using optimized architectures for complex queries rather than transactional processing.

Business analysts, data engineers, data scientists, and decision-makers utilize business intelligence (BI) tools, SQL clients, and other analytics applications to access data warehouse. The architecture comprises tiers: a front-end client for presenting results, an analytics engine for data access and analysis, and a database server for data loading and storage.

Data is stored in tables and columns within databases, organized by schemas. Data warehouses consolidate data from multiple sources, enabling historical data analysis and ensuring data quality, consistency, and accuracy. Separating analytics processing from transactional databases enhances the performance of both systems, supporting reports, dashboards, and analytics tools by efficiently storing data to minimize I/O and deliver rapid query results to numerous concurrent users.

![\[Data Warehouse Architecture\]](http://docs.aws.amazon.com/sap/latest/general/images/rise-data-warehouse-architecture.png)


Key Characteristics
+ Integrated: Consolidates data from disparate sources (e.g., CRM, ERP) into a unified schema, resolving inconsistencies in formats or naming conventions.
+ Time-variant: Tracks historical data, allowing trend analysis over months or years.
+ Subject-oriented: Organized around business domains like sales or inventory, rather than operational processes.
+ Non-volatile: Data remains static once stored; updates occur via scheduled Extract, Transform, Load (ETL) processes rather than real-time changes.
+ Price-optimized: SAP and non-SAP data is stored in a cost-optimized architecture.

Architecture Components
+ ETL Tools: Automate data extraction from sources, transformation (cleaning and standardizing), and loading into the warehouse.
+ Storage Layer:
  + Relational databases for structured data
  + OLAP (Online Analytical Processing) cubes for multidimensional analysis
+ Metadata: Describes data origins, transformations, and relationships.
+ Access Tools: SQL clients, BI platforms, and machine learning interfaces.

![\[Data Warehouse Layers\]](http://docs.aws.amazon.com/sap/latest/general/images/rise-data-warehouse-layers.png)


Data warehouses utilize a layered architecture to organize data at different levels of granularity, which helps ensure consistency and flexibility. The most common data warehouse architecture layers are the source, staging, warehouse, and consumption layers. SAP systems also employ a layer-based architecture for data warehouses. In the context of building a SAP cloud data warehouse on AWS. the architecture involves several key layers and components for data acquisition, storage, transformation, and consumption.

 **Corporate Memory** 

Amazon S3 Intelligent-Tiering is a storage class that automatically optimizes storage costs by moving data between access tiers based on changing access patterns. This ensures that frequently accessed data is readily available, while less frequently accessed or "colder" data is stored at a lower cost tier. For more details, you can refer to [Amazon S3 Storage Classes](https://aws.amazon.com/s3/storage-classes/#topic-0).

 **Operational Data Storage Layer** 

Amazon Redshift is utilized for operational data storage, propagation, and data mart functionalities. Scripts are provided to create schemas and deploy Data Definition Language (DDL) with the necessary structures to load SAP source data. These DDLs can be customized to include SAP-specific fields.

 **Data Propagation Layer** 

Delta data loaded into S3 via Glue job is used to generate Slowly Changing Dimension Type 2 (SCD2) tables, which maintain a complete history of changes.

 **Data Mart Layer** 

Architected data mart models are created using Materialized Views in Redshift. Transactional data is enriched with master data (attributes and text), building data models that are ready for data consumption.

The [Building SAP Data Warehouse on AWS Solution Guidance](https://aws.amazon.com/solutions/guidance/building-a-sap-cloud-data-warehouse-on-aws/) provides a detailed architecture, steps to implement and accelerators to fast track the implementation of a Data Warehouse for SAP.