Receiving model logs and metrics
To receive logs and metrics from custom model training or inference, members must have created an ML Configuration with a valid role that provides the necessary CloudWatch permissions (see Create a service role for custom ML modeling - ML Configuration).
System metric
System metrics for both training and inference, such as CPU and memory
utilization, are published to all members in the collaboration with valid ML
Configurations. These metrics can be viewed as the job progresses via CloudWatch
Metrics in the /aws/cleanroomsml/TrainedModels
or
/aws/cleanroomsml/TrainedModelInferenceJobs
namespaces,
respectively.
Model logs
Access to the model logs is provided by the privacy configuration policy of each
configured model algorithm. The model author sets the privacy configuration policy
when associating a configured model algorithm (either via the console or the
CreateConfiguredModelAlgorithmAssociation
API) to a collaboration.
Setting the privacy configuration policy controls which members can receive the
model logs.
Additionally, the model author can set a filter pattern in the privacy
configuration policy to filter log events. All logs that a model container sends to
stdout
or stderr
and that match the filter pattern (if
set), are sent to Amazon CloudWatch Logs. Model logs are available in CloudWatch log
groups /aws/cleanroomsml/TrainedModels
or
/aws/cleanroomsml/TrainedModelInferenceJobs
, respectively.
Custom defined metrics
When you configure a model algorithm (either via the console or the
CreateConfiguredModelAlgorithm
API), the model author can provide
specific metric names and regex statements to search for in the output logs. These
can be viewed as the job progresses via CloudWatch Metrics in the
/aws/cleanroomsml/TrainedModels
namespace. When associating a
configured model algorithm, the model author can set an optional noise level in the
metrics privacy configuration to avoid outputting raw data while still providing
visibility into custom metric trends. If a noise level is set, the metrics are
published at the end of the job rather than in real time.