5. Data security and governance for agentic AI systems on AWS
Data protection in agentic AI systems extends beyond traditional access controls. It includes model training data and inference outputs. Proper classification and handling reduce the likelihood of inadvertent exposure through model interactions or training processes.
This section contains the following best practices:
5.1 Implement pipelines for fine-tuning data (AI-specific)
Fine-tuning data shapes system behavior and responses. For more information, see Machine Learning Lens - AWS Well-Architected Framework. Understanding the lineage, bias, and quality of the data can support informed decisions about which data to include when fine tuning. Including data from other systems or data generated from the in-scope system without appropriate data quality measures can open the system to poisoning or misinformation attacks. For these reasons, it is highly recommended that you implement a data pipeline to deliver data into the production system and remove human access to this data.
Fine-tuning data might be vulnerable to exposure through some attack methods. For this reason, we recommend that you perform a risk evaluation on what data is used. For more information, see Implement data purification filters for model training workflows in the AWS Well-Architected Framework. For some organizations, using sensitive data for fine tuning might be unpalatable due to the risk of sensitive information disclosure through prompt manipulation. This conservative approach can help protect your organization's data.
5.2 Restrict AI operations against sensitive systems (AI-specific)
Restrict AI operations against critical data sources by implementing validation and approval workflows before data access or modification. When downstream systems lack adequate controls, deploy a deterministic broker tool to mediate between agents and data sources.
The following image shows how you can use a deterministic broker tool to inspect user prompts for malicious attacks. On the left side of the image, without a deterministic broker tool, the agent might allow a malicious action on the data system. On the right side of the image, a deterministic broker tool provides an additional control layer. It implements adaptive authentication to inspect and govern high-risk data modifications. For more information about adaptive authentication, see Working with adaptive authentication in the Amazon Cognito documentation.
5.3 Establish a data governance framework (General)
Authorization systems control access to organizational assets, and data is one of
the most important resources that require protection. Mature data governance
5.4 Prevent data loss (General)
Data loss prevention
(DLP)