5. Data security and governance for agentic AI systems on AWS - AWS Prescriptive Guidance

5. Data security and governance for agentic AI systems on AWS

Data protection in agentic AI systems extends beyond traditional access controls. It includes model training data and inference outputs. Proper classification and handling reduce the likelihood of inadvertent exposure through model interactions or training processes.

5.1 Implement pipelines for fine-tuning data (AI-specific)

Fine-tuning data shapes system behavior and responses. For more information, see Machine Learning Lens - AWS Well-Architected Framework. Understanding the lineage, bias, and quality of the data can support informed decisions about which data to include when fine tuning. Including data from other systems or data generated from the in-scope system without appropriate data quality measures can open the system to poisoning or misinformation attacks. For these reasons, it is highly recommended that you implement a data pipeline to deliver data into the production system and remove human access to this data.

Fine-tuning data might be vulnerable to exposure through some attack methods. For this reason, we recommend that you perform a risk evaluation on what data is used. For more information, see Implement data purification filters for model training workflows in the AWS Well-Architected Framework. For some organizations, using sensitive data for fine tuning might be unpalatable due to the risk of sensitive information disclosure through prompt manipulation. This conservative approach can help protect your organization's data.

5.2 Restrict AI operations against sensitive systems (AI-specific)

Restrict AI operations against critical data sources by implementing validation and approval workflows before data access or modification. When downstream systems lack adequate controls, deploy a deterministic broker tool to mediate between agents and data sources.

The following image shows how you can use a deterministic broker tool to inspect user prompts for malicious attacks. On the left side of the image, without a deterministic broker tool, the agent might allow a malicious action on the data system. On the right side of the image, a deterministic broker tool provides an additional control layer. It implements adaptive authentication to inspect and govern high-risk data modifications. For more information about adaptive authentication, see Working with adaptive authentication in the Amazon Cognito documentation.

A broker tool inspects prompts for malicious actions and governs high-risk data modifications.

5.3 Establish a data governance framework (General)

Authorization systems control access to organizational assets, and data is one of the most important resources that require protection. Mature data governance provides the foundation that enables agents to operate autonomously while respecting organizational data protection policies. Data governance frameworks should align with authorization systems to eliminate ambiguity or errors when agents evaluate access permissions. For more information, see Data security, lifecycle, and strategy for generative AI applications on AWS Prescriptive Guidance.

5.4 Prevent data loss (General)

Data loss prevention (DLP) technology can act as an additional defense-in-depth layer for agentic AI systems. It can detect and prevent unauthorized data exfiltration. DLP implementations vary widely, and efficacy as a control can vary depending on the data type, volume, and baseline. If organizations have well-established DLP capabilities, extending these capabilities to support agentic AI systems can provide an efficient supplementary control.