Track experiments using MLflow
Amazon SageMaker Unified Studio supports two options for tracking experiments with MLflow: MLflow Apps and MLflow Tracking Servers. MLflow Apps are the latest offering with faster startup times and cross-account sharing, while MLflow Tracking Servers provide traditional MLflow functionality.
For more information about MLflow, see Machine learning experiments using MLflow in the Amazon SageMaker AI Developer Guide.
Use MLflow Apps for experiment tracking
Use MLflow Apps in Amazon SageMaker Unified Studio to track, manage, analyze, and compare machine learning experiments. MLflow Apps are the latest managed MLflow offering and provide faster startup times, cross-account sharing, and integration with other SageMaker AI features.
MLflow Apps should be preferred over existing MLflow Tracking Servers.
MLflow Apps provide capabilities for experiment tracking, model registry, and tracing generative AI applications. Each MLflow App includes compute resources, backend metadata storage, and artifact storage in Amazon S3.
Note
MLflow Apps are different from MLflow Tracking Servers. MLflow Apps offer additional features such as faster startup time, cross-account sharing, and automatic model registration. For information about MLflow Tracking Servers, see Use MLflow Tracking Servers for experiment tracking.
For more information about MLflow Apps, see MLflow App Setup in the Amazon SageMaker AI Developer Guide.
Topics
MLflow Apps overview
An MLflow App is a stand-alone HTTP server that serves multiple REST API endpoints for tracking runs and experiments. MLflow Apps provide the following capabilities:
-
Experiment tracking: Track parameters, metrics, and artifacts across multiple training runs to identify the best performing models.
-
Model registry: Manage model versions and catalog models for production deployment.
-
Tracing: Record inputs, outputs, and metadata at every step of a generative AI application to identify issues and maintain traceability.
-
Automatic model registration: Automatically register models from MLflow Model Registry to SageMaker AI Model Registry.
You can create MLflow Apps for your project. Your domain administrator can configure the project profile to automatically create an MLflow App during project creation, though this is not recommended due to default quota limits and increased project creation latency. We recommend creating MLflow Apps on demand after project creation.
When you delete a project, Amazon SageMaker Unified Studio automatically deletes associated MLflow Apps.
Prerequisites
Before you create an MLflow App, ensure you have the following:
-
An Amazon S3 bucket in the same AWS Region as your project for artifact storage. The MLflow App uses this bucket to store model artifacts, images, and data files.
-
Appropriate IAM permissions to create and manage MLflow Apps. Your domain administrator configures these permissions through the following IAM policies:
-
SageMakerStudioProjectRoleMachineLearningPolicy
-
SageMakerStudioProjectProvisioningRolePolicy
-
SageMakerStudioUserIAMDefaultExecutionPolicy
-
SageMakerStudioAdminIAMDefaultExecutionPolicy
-
For more information about IAM permissions, see the Security chapter.
Create an MLflow App
After you create a project, you can create an MLflow App for the project if it was not created automatically during project creation.
To create an MLflow App, perform the following steps:
-
Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.
-
From the left menu, choose Compute.
-
From the tabs in the top banner, choose MLflow.
-
Choose Create MLflow App.
-
For Name, enter a name for the MLflow App. The name must start with a letter or number and can contain letters, numbers, and hyphens.
-
Choose Create to create the MLflow App.
Note
It may take 2-3 minutes to complete MLflow App creation. When you successfully create an MLflow App, it automatically starts.
Edit an MLflow App
After you create an MLflow App, you can change the artifact storage location. To edit an MLflow App, perform the following steps:
-
From the left menu, choose Compute.
-
From the tabs in the top banner, choose MLflow.
-
From the Actions drop-down menu, choose Edit.
-
For Artifact storage S3 path, enter a new path to the artifact storage.
-
Choose Save changes to update the MLflow App.
Delete an MLflow App
You can delete an MLflow App when you no longer need it. Deleting an MLflow App removes the compute resources but does not delete the artifacts stored in Amazon S3.
To delete an MLflow App, perform the following steps:
-
From the left menu, choose Compute.
-
From the tabs in the top banner, choose MLflow.
-
From the Actions drop-down menu, choose Delete.
-
In the confirmation dialog, enter the MLflow App name to confirm deletion.
Note
An MLflow App is not available for use while it is in certain states, such as creating, pending deletion, or other transitional states.
-
Choose Delete to delete the MLflow App.
Important
Deleting an MLflow App is permanent and cannot be undone. Ensure you have backed up any important experiment data before deleting the MLflow App.
Launch the MLflow UI
You can launch the MLflow UI to view and manage your experiments, models, and traces. To launch the MLflow UI, perform the following steps:
-
From the left menu, choose Compute.
-
From the tabs in the top banner, choose MLflow.
-
Choose the Open button next to the MLflow App. This action uses a presigned URL to launch the MLflow UI in a new tab in your current browser.
For more information about using the MLflow UI, see Launch the MLflow UI using a presigned URL in the Amazon SageMaker AI Developer Guide.
Integrate MLflow with your environment
After you create an MLflow App, you can integrate it with your development environment to track experiments and log metrics.
To integrate MLflow with your environment, you need the MLflow App ARN. You can find the ARN on the MLflow App details page.
For detailed information about integrating MLflow with your environment, including code examples for Python notebooks, see Integrate MLflow with your environment in the Amazon SageMaker AI Developer Guide.
Use MLflow Tracking Servers for experiment tracking
Use MLflow Tracking Servers in Amazon SageMaker Unified Studio to track, manage, analyze, and compare machine learning experiments. MLflow Tracking Servers provide compute and storage resources for experiment tracking. Each project can have an MLflow Tracking Server. Your domain administrator can configure the project defaults to automatically create the MLflow Tracking Server during project creation. Otherwise, you can create an MLflow Tracking Server on demand for the project.
Note
MLflow Tracking Servers are different from MLflow Apps. MLflow Apps offer additional features such as faster startup time, cross-account sharing, and automatic model registration. For information about MLflow Apps, see Use MLflow Apps for experiment tracking.
When you delete a project, Amazon SageMaker Unified Studio automatically deletes the tracking server.
For more information about MLflow Tracking Servers, see MLflow Tracking Servers in the Amazon SageMaker AI Developer Guide.
For more information about project profiles for AI-ML projects, see Project profiles in the Amazon SageMaker Unified Studio Admin Guide.
Create an MLflow Tracking Server
After you create a project, you can create an MLflow Tracking Server for the project, if it wasn't created automatically during project creation.
To create an MLflow Tracking Server, perform the following steps:
-
Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.
-
From the top banner, choose your project from the projects drop-down menu, and choose Project overview.
-
From the left menu, choose Compute.
-
From the tabs in the top banner, choose MLflow.
-
Choose Create MLflow Tracking Server.
-
(Optional) Provide values to override the default values for the following fields:
-
Name – enter a name for the server.
-
Size – select a size for the server.
-
-
Choose Create to create the server.
Edit an MLflow Tracking Server
After you create a tracking server, you can change the configured server size, if the current size isn't sufficient for the project.
To edit a tracking server, perform the following steps from your project's MLflow tab under Compute:
-
From the Actions drop-down menu, choose Edit. You can change the following values:
-
Size – select a new size for the server.
-
Artifact storage S3 path – enter a new path to the artifact storage.
-
-
Choose Save changes to update the tracking server.
Start or stop an MLflow Tracking Server
You can stop a running server or start a stopped server. While the tracking server is starting or stopping, it's not available for MLflow to use.
To start or stop an MLflow tracking server, perform the following steps from your project's Project details page:
-
From the left menu, choose Compute.
-
From the tabs in the top banner, choose MLflow.
-
From the Actions drop-down menu, choose Stop to stop a running server. Choose Start to start a stopped server.
Integrate MLflow with your environment
For information about how to integrate MLflow with your environment, see Integrate MLflow with your environment in the Amazon SageMaker AI Developer Guide.
Launch the MLflow UI
You can launch the MLflow Tracking Server UI from the MLflow tab under Compute, by performing the following steps:
-
Navigate to the project details page for your project.
-
From the left menu, choose Compute.
-
From the tabs in the top banner, choose MLflow.
-
From the Actions drop-down menu, choose Open MLflow. This action uses a presigned URL to launch the MLflow UI in a new tab in your current browser.
For more information, see Launch the MLflow UI using a presigned URL in the Amazon SageMaker AI Developer Guide.