Track inference recommendation and benchmark results with MLflow
You can track the results of your recommendation and benchmark jobs with fully managed MLflow on Amazon SageMaker AI. When you provide an MLflow configuration, SageMaker AI logs the performance metrics and artifacts that the job produces to an MLflow App. You can then use the MLflow UI to view individual runs, compare configurations across jobs, and share results with your team. For more information about MLflow on SageMaker AI, see Accelerate generative AI development using managed MLflow on Amazon SageMaker AI.
MLflow tracking is optional. To enable it, provide an MLflow configuration that references an MLflow App when you create the job.
Prerequisites
Before you track results with MLflow, you need the following in addition to the Prerequisites for a recommendation or benchmark job:
-
An MLflow App in an operational state, in the same AWS Region as your job. For information about creating an MLflow App, see MLflow App Setup.
-
An IAM execution role (the
RoleArnthat you pass to the job) with permission to describe the MLflow App and call the MLflow REST APIs. Add thesagemaker:DescribeMlflowAppaction and thesagemaker-mlflowactions that the job uses to log runs. For more information about MLflow permissions, see MLflow App Setup.
MLflow configuration parameters
To track results, add an MlflowConfig object to the
OutputConfig of your CreateAIRecommendationJob or
CreateAIBenchmarkJob request. MlflowConfig contains the
following fields:
MlflowResourceArn-
Required. The Amazon Resource Name (ARN) of the MLflow App to log results to, in the format
arn:aws:sagemaker:.region:account-id:mlflow-app/app-id MlflowExperimentName-
Optional. The name of the MLflow experiment to log the run under. If the experiment does not exist, SageMaker AI creates it; if it already exists, SageMaker AI reuses it, so you can group runs from multiple jobs under one experiment by passing the same name.
MlflowRunName-
Optional. The name of the top-level MLflow run that the job's results are nested under. To group the results of multiple jobs under one top-level run, pass the same
MlflowRunNameto each job.
Each job creates a top-level MLflow run, with nested child runs that break the results down by dimensions such as the benchmark target or instance type, the deployment configuration, and the concurrency level. This structure lets you drill into individual measurements and compare configurations in the MLflow UI.
The MlflowConfig that you supply on the create request is echoed back in
the OutputConfig of the corresponding Describe response.
Track a recommendation job
Add the MlflowConfig object to OutputConfig when you create a
recommendation job.
Python (boto3)
response = client.create_ai_recommendation_job( AIRecommendationJobName="my-recommendation-job", ModelSource={ "S3": { "S3Uri": "s3://DOC-EXAMPLE-BUCKET/models/my-model/", } }, OutputConfig={ "S3OutputLocation": "s3://DOC-EXAMPLE-BUCKET/recommendations/", "MlflowConfig": { "MlflowResourceArn": "arn:aws:sagemaker:us-west-2:111122223333:mlflow-app/app-EXAMPLE", "MlflowExperimentName": "my-recommendation-experiment", "MlflowRunName": "my-recommendation-job-run", }, }, PerformanceTarget={ "Constraints": [ {"Metric": "ttft-ms"} ] }, AIWorkloadConfigIdentifier="my-recommendation-workload", RoleArn="arn:aws:iam::111122223333:role/ExampleRole", ) print(response["AIRecommendationJobArn"])
AWS CLI
aws sagemaker create-ai-recommendation-job \ --ai-recommendation-job-name "my-recommendation-job" \ --model-source '{"S3": {"S3Uri": "s3://DOC-EXAMPLE-BUCKET/models/my-model/"}}' \ --output-config '{ "S3OutputLocation": "s3://DOC-EXAMPLE-BUCKET/recommendations/", "MlflowConfig": { "MlflowResourceArn": "arn:aws:sagemaker:us-west-2:111122223333:mlflow-app/app-EXAMPLE", "MlflowExperimentName": "my-recommendation-experiment", "MlflowRunName": "my-recommendation-job-run" } }' \ --performance-target '{"Constraints": [{"Metric": "ttft-ms"}]}' \ --ai-workload-config-identifier "my-recommendation-workload" \ --role-arn "arn:aws:iam::111122223333:role/ExampleRole" \ --region us-west-2
Track a benchmark job
Add the MlflowConfig object to OutputConfig when you create a
benchmark job.
Python (boto3)
response = client.create_ai_benchmark_job( AIBenchmarkJobName="my-benchmark-job", BenchmarkTarget={ "Endpoint": { "Identifier": "my-sagemaker-endpoint" } }, OutputConfig={ "S3OutputLocation": "s3://DOC-EXAMPLE-BUCKET/benchmark-results/", "MlflowConfig": { "MlflowResourceArn": "arn:aws:sagemaker:us-west-2:111122223333:mlflow-app/app-EXAMPLE", "MlflowExperimentName": "my-benchmark-experiment", "MlflowRunName": "my-benchmark-job-run", }, }, AIWorkloadConfigIdentifier="my-benchmark-config", RoleArn="arn:aws:iam::111122223333:role/ExampleRole", ) print(response["AIBenchmarkJobArn"])
AWS CLI
aws sagemaker create-ai-benchmark-job \ --ai-benchmark-job-name "my-benchmark-job" \ --benchmark-target '{"Endpoint": {"Identifier": "my-sagemaker-endpoint"}}' \ --output-config '{ "S3OutputLocation": "s3://DOC-EXAMPLE-BUCKET/benchmark-results/", "MlflowConfig": { "MlflowResourceArn": "arn:aws:sagemaker:us-west-2:111122223333:mlflow-app/app-EXAMPLE", "MlflowExperimentName": "my-benchmark-experiment", "MlflowRunName": "my-benchmark-job-run" } }' \ --ai-workload-config-identifier "my-benchmark-config" \ --role-arn "arn:aws:iam::111122223333:role/ExampleRole" \ --region us-west-2
View results in MLflow
After the job reaches the Completed status, open the MLflow UI from the
MLflow App to inspect the logged runs. For more information about launching the MLflow
UI, see Launch the MLflow UI using a presigned URL.
On the Experiments page, select the experiment that you specified
in MlflowExperimentName.
The experiment's Runs view shows a top-level run that contains a per-job run, which expands into child runs for each instance type, deployment configuration, and concurrency level. Select a run to view its parameters, metrics, and artifacts.