MLREL01-BP01 Use APIs to abstract change from model consuming applications

APIs abstract changes from model-consuming applications, keeping machine learning solutions flexible and resilient. Establishing an abstraction layer between ML models and consuming applications enables model updates, replacements, or enhancements without disrupting existing workloads.

Desired outcome: You have a flexible application and API design that isolates machine learning model implementations from consuming applications. You make changes to ML models with minimal or no disruption to existing applications. Your ML endpoints are well-documented and accessible, and changes to underlying models do not require extensive modifications to downstream applications.

Common anti-patterns:

Directly embedding model code within applications.
Hardcoding model versions or parameter specifications in client applications.
Lacking proper API documentation and version control.
Designing rigid interfaces that break when model inputs or outputs change.
Creating tight coupling between ML models and consuming applications.

Benefits of establishing this best practice:

Reduces downtime when updating or replacing ML models.
Simplifies model deployment and versioning processes.
Increases agility and flexibility when evolving ML capabilities.
Lowers maintenance costs for applications using ML models.
Enhances ability to A/B test different model versions.

Level of risk exposed if this best practice is not established: High

Implementation guidance

Abstracting changes from model-consuming applications requires thoughtful API design and implementation. Create a well-designed API layer between ML models and applications so that you can make modifications to models without disrupting services. This approach involves developing stable interfaces that hide underlying complexity and implementation details of ML models.

When designing these APIs, focus on creating contracts that are flexible enough to accommodate model evolution while maintaining backward compatibility. Document APIs thoroughly so developers consuming models understand how to interact with them correctly. Consider implementing versioning strategies that allow introducing new model capabilities while supporting existing clients.

Implementation steps

Adopt best practices in API design. Expose ML endpoints through APIs so changes to the model can be introduced without disrupting upstream communications. Create a well-designed API contract that focuses on business capabilities rather than technical implementation details. Document your API in a central repository or documentation site so calling services can understand API routes and flags. Communicate changes to your API with calling services.
Implement API versioning. Use versioning strategies for APIs to enable backward compatibility while supporting new features. Consider using URL path versioning (for example, /v1/predict), header-based versioning, or query parameter versioning depending on organizational standards. This allows introducing new model versions without breaking existing client applications.
Deploy models in Amazon SageMaker AI. After training your model, deploy it using Amazon SageMaker AI to get predictions. To establish a persistent endpoint for one prediction at a time, use SageMaker AI hosting services. For predictions on entire datasets, use SageMaker AI batch transform. SageMaker AI provides flexibility in deployment options, including multi-model endpoints, serverless inference, and asynchronous inference.
Use Amazon API Gateway to create APIs. Amazon API Gateway is a fully managed service that enables developers to create, publish, maintain, monitor, and secure APIs. Using API Gateway, you can create RESTful APIs and WebSocket APIs that enable real-time two-way communication applications. API Gateway supports containerized and serverless workloads, as well as web applications.
Implement request and response transformations. Use API Gateway's mapping templates to transform client requests to match your model's input format and transform model responses to maintain a consistent API contract. This allows changing model implementations without requiring client applications to change how they format requests or interpret responses.
Add caching and throttling. Configure API Gateway's caching capability to improve performance and reduce costs for frequently accessed predictions. Implement throttling to protect ML endpoints from traffic spikes and provide consistent performance. Use SageMaker AI Inference Recommender to optimize endpoint configurations for optimal latency and cost performance.
Monitor and analyze API usage. Set up monitoring and logging for APIs to understand how they are being used and identify potential issues. Use Amazon CloudWatch metrics and logs to track API performance, errors, and usage patterns. This data can optimize ML endpoints and identify opportunities for improvement.
Consider inference components for shared endpoints. Use SageMaker AI inference components to deploy multiple models to shared endpoints, improving resource utilization and reducing costs while maintaining API abstraction.

Resources

Related documents:

Related videos:

Amazon Sagemaker Serverless Inference

Related examples:

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

ML problem framing

MLREL01-BP02 Adopt a machine learning microservice strategy