MLOE-08: Establish feedback loops across ML lifecycle phases
Establish a feedback mechanism to share and communicate successful development experiments, analysis of failures, and operational activities. This facilitates continuous improvement on future iterations of the ML workload. ML feedback loops are driven by model drifts and requires ML practitioners to analyze and revisit monitoring and retraining strategies over time. ML feedback loops allow experimentation with data augmentation, and different algorithms and training approaches until an optimal outcome is achieved. Document your findings to identify key learnings and improve processes over time.
Implementation plan
-
Establish SageMaker AI Model Monitoring - The accuracy of ML models can deteriorate over time, a phenomenon known as model drift. Many factors can cause model drift, such as changes in model features. The accuracy of ML models can also be affected by concept drift, the difference between data used to train models and data used during inference. Amazon SageMaker AI Model Monitor continually monitors machine learning models for concept drift and model drift. SageMaker AI Model Monitor alerts you if there are any deviations so that you can take remedial action.
-
Use Amazon CloudWatch - Configure Amazon CloudWatch
to receive notifications if a drift in model quality is observed. Monitoring jobs can be scheduled to run at a regular cadence (for example, hourly or daily) and push reports as well as metrics to Amazon CloudWatch and Amazon S3 . -
Use Amazon SageMaker AI Model Dashboard as the central interface to track models, monitor performance, and review historical behavior
-
Automate retraining pipelines - Create a CloudWatch Events rule that alerts on a events emitted by the SageMaker AI Model Monitoring system. The event rule can detect the drifts or anomalies, and start a retraining pipeline.
-
-
Use Amazon Augmented AI (A2I) - Check accuracy by having human reviews to establish the ground truth, using tools such as Amazon A2I
, against which model performance can be compared.
Documents
Blogs
-
Automating model retraining and deployment using the AWS Step Functions Data Science SDK for
Amazon SageMaker AI -
Monitoring in-production ML models at large scale using Amazon SageMaker AI Model Monitor
-
Human-in-the-loop review of model explanations with Amazon SageMaker AI Clarify and Amazon A2I
-
Amazon SageMaker AI Model Monitor now supports new capabilities to maintain model quality in
production