Best practices arranged by pillar
This is a list of best practices outlined in this paper organized by the pillars of the AWS Well-Architected Framework.
Operational excellence pillar
- MLOE-01: Develop the right skills with accountability and empowerment 
- MLOE-02: Discuss and agree on the level of model explainability 
- MLOE-08: Establish feedback loops across ML lifecycle phases 
- MLOE-13: Establish reliable packaging patterns to access approved public libraries 
- MLOE-16: Synchronize architecture and configuration, and check for skew across environments 
Security pillar
Reliability pillar
Performance efficiency pillar
Cost optimization pillar
- MLCOST-01: Define overall return on investment (ROI) and opportunity cost 
- MLCOST-02: Use managed services to reduce total cost of ownership (TCO) 
- MLCOST-03: Identify if machine learning is the right solution 
- MLCOST-04: Tradeoff analysis on custom versus pre-trained models 
- MLCOST-11: Select local training for small scale experiments 
- MLCOST-18: Use warm-start and checkpointing hyperparameter tuning 
- MLCOST-20 - Setup budget and use resource tagging to track costs 
- MLCOST-29: Monitor endpoint usage and right-size the instance fleet