

# Machine learning
<a name="sagemaker"></a>

 Amazon SageMaker Unified Studio is a unified development experience for building analytics, AI/ML, and generative AI applications at scale. This chapter describes the Amazon SageMaker AI capabilities that you can use in Amazon SageMaker Unified Studio.

**Note**  
When you add a custom tag to a SageMaker AI resource (such as a training job, inference endpoint, model, or pipeline), add the prefix `ProjectUserTag` to the tag name. For example:  

```
ProjectUserTagMyCustomTag
```

**Note**  
ECR repositories must be created with the `AmazonDataZoneProject` tag with the project ID (which can be found under project details in the project overview page or from the page URL) as the tag value. If you want to add your own tags, they must be prefixed with `ProjectUserTag`.  
For example, with AWS CLI:  

```
aws ecr create-repository \
--repository-name my-repo \
--tags \
Key=AmazonDataZoneProject,Value=5blxelum5cmckb \
Key=ProjectUserTagMyTag,Value=MyTagValue \
```
Example using Jupyterlab notebook:  

```
import boto3

# Create ECR client
ecr_client = boto3.client('ecr')

# Define repository name
repository_name = 'my-ecr-repo'

# Define tags
tags = [
    {
        'Key': 'AmazonDataZoneProject',
        'Value': '5blxelum5cmckb'
    },
    {
        'Key': 'ProjectUserTagMyTag',
        'Value': 'MyTagValue'
    },
]

try:
    # Create the repository with tags
    response = ecr_client.create_repository(
        repositoryName=repository_name,
        imageScanningConfiguration={
            'scanOnPush': True
        },
        encryptionConfiguration={
            'encryptionType': 'AES256'
        },
        tags=tags
    )

    repository_uri = response['repository']['repositoryUri']
    print(f"Repository created successfully!")
    print(f"Repository URI: {repository_uri}")

except ecr_client.exceptions.RepositoryAlreadyExistsException:
    print(f"Repository {repository_name} already exists")
    # Add tags to existing repository
    ecr_client.tag_resource(
        resourceArn=f"arn:aws:ecr:{ecr_client.meta.region_name}:{boto3.client('sts').get_caller_identity()['Account']}:repository/{repository_name}",
        tags=tags
    )
    # Get the repository URI
    response = ecr_client.describe_repositories(repositoryNames=[repository_name])
    repository_uri = response['repositories'][0]['repositoryUri']
    print(f"Added tags to existing repository")
    print(f"Repository URI: {repository_uri}")
except Exception as e:
    print(f"Error creating repository: {str(e)}")
```
ECR repositories without the `AmazonDataZoneProject` cannot be used. You must create new ECR repositories with the `AmazonDataZoneProject` tag. Once tagged with the `AmazonDataZoneProject` tag, this tag cannot be modified or removed from your ECR repositories. For more information about ECR repositories, see [https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html).

**Topics**
+ [

# Machine learning in Identity Center-based domains
](sagemaker-identity-center-based-domains.md)
+ [

# Machine Learning Workflows in IAM-based domains
](sagemaker-iam-based-domains.md)

# Machine learning in Identity Center-based domains
<a name="sagemaker-identity-center-based-domains"></a>

**Topics**
+ [

# Discover Jumpstart models
](sagemaker-discover-models.md)
+ [

# Build models in JupyterLab
](sagemaker-build-models.md)
+ [

# Train models
](sagemaker-train-models.md)
+ [

# Use inference endpoints to deploy models
](sagemaker-deploy-models.md)
+ [

# Pipelines
](sagemaker-pipelines.md)
+ [

# Model registry
](sagemaker-register-models.xml.md)
+ [

# Track experiments using MLflow
](sagemaker-experiments.xml.md)
+ [

# HyperPod clusters
](sagemaker-hyperpods.md)
+ [

# Partner AI apps
](sagemaker-partner-apps.md)

# Discover Jumpstart models
<a name="sagemaker-discover-models"></a>

Amazon SageMaker Unified Studio maintains publicly available foundation models for you to access, customize, and integrate into your machine learning lifecycles. A foundation model is a large pre-trained model that's adaptable to a variety of downstream tasks and often serves as the starting point for developing more specialized models.

To explore the available models from our model providers, follow these steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the main menu, choose **Build**.

1. From the drop-down menu, choose **Jumpstart Models**.

   The system opens a page that lists the model providers.

1. Choose a provider.

1. From the provider's list of models, choose a model to view its details.

For more information about discovering models, see [JumpStart foundation models](https://docs.aws.amazon.com/sagemaker/latest/dg/jumpstart-foundation-models.html) in the *Amazon SageMaker AI Developer Guide*.

# Build models in JupyterLab
<a name="sagemaker-build-models"></a>

Use the JupyterLab space within Amazon SageMaker Unified Studio to run JupyterLab applications. A JupyterLab space is a private or shared space that manages the storage and compute resources needed to run the JupyterLab application. 

The JupyterLab application is a web-based interactive development environment (IDE) for notebooks, code, and data. Use the JupyterLab application's flexible and extensive interface to configure and arrange machine learning (ML) workflows.

Your project contains a configured JupyterLab space.

To open the JupyterLab space in Amazon SageMaker Unified Studio, follow these steps: 

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the main menu, choose **Build**.

1. Under **IDE & Applications**, choose **JupyterLab**.

Amazon SageMaker Unified Studio opens the JupyterLab space associated with your project. Choose **Configure space** to tailor the configuration to your needs.

For more information about using JupyterLab, see [SageMaker JupyterLab](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl.html) in the *Amazon SageMaker AI Developer Guide*.

# Train models
<a name="sagemaker-train-models"></a>

Using Amazon SageMaker Unified Studio, you can train foundation models or custom models. 

Follow these steps to train a foundation model:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. Choose a model to train.

   1. From the main menu, choose **Build**.

   1. From the drop-down menu, choose **Jumpstart Models**.

      The JumpStart page lists the model providers.

   1. Choose a model provider. The page displays the models for that provider.

   1. Under **Action**, choose **Trainable**. The page displays the trainable models for that provider.

   1. From the provider's list of models, choose the model you want to train.

1. From the model details page, choose **Train** to create a training job.

   If the model is pretrained, you can fine-tune the model by adjusting the model parameters.

1. In the **Fine-tuning model** page, update the hyperparameters you want to change.

1. Enter **Submit** to submit the training job. You can view the training job from the **Training jobs** page.

You can also train the model in a Jupyterlab notebook using the SageMaker AI python SDK. 

For more information about training models in JumpStart, see [JumpStart pretrained models](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-jumpstart.html) in the *Amazon SageMaker AI Developer Guide*.

# Use inference endpoints to deploy models
<a name="sagemaker-deploy-models"></a>

Endpoint are locations where you send inference requests to your deployed machine learning models. After you create an endpoint, you can add models to it, test it, and change its settings as needed. By using endpoints, you don't have to manage the underlying infrastructure for configuring and deploying a model. 

For more information about using endpoints for real-time inference, see [Deploy models for real-time inference](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-deploy-models.html) in the *Amazon SageMaker AI Developer Guide*. Also see the [ Getting started with deploying real time models on SageMaker AI](https://aws.amazon.com/blogs/machine-learning/part-2-model-hosting-patterns-in-amazon-sagemaker-getting-started-with-deploying-real-time-models-on-sagemaker/) blog post.

**Topics**
+ [

## Create an endpoint and deploy a model
](#sagemaker-create-endpoint)
+ [

## View your endpoints
](#sagemaker-view-endpoints)

## Create an endpoint and deploy a model
<a name="sagemaker-create-endpoint"></a>

To create an endpoint, follow these steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the main menu, choose **Build**.

1. From the drop-down menu, choose **Inference endpoints**.

1. From the **Endpoints** page, choose **Create endpoint**.

1. From the **Create endpoint** page, configure these values:
   + For **Endpoint name**, enter a name for the endpoint.
   + For **Instance type**, choose an instance for the endpoint.
   + For **Initial instance count**, enter the number of instances for the endpoint to provision initially.
   + For **Maximum instance count**, enter the maximum number of instances that the endpoint can provision, when it scales up.

1. Under **Models**, choose **Add model**. In the **Add model** modal form, follow these steps:

   1. Select the model type (JumpStart foundation models or Deployable models that you created).

      The form lists the models that are compatible with the instance type you selected.

   1. Choose one of the models.

   1. Under **Model settings**, enter these values: 
      + Number of CPU cores – Number of accelerators to deploy.
      + Minimum number of copies – minimum number of model copies to deploy.
      + Min CPU memory – Minimum amount of CPU memory.
      + Max CPU memory – Maximum amount of CPU memory.

   1. Choose **Add model**.

1. Choose **Deploy** to deploy the endpoint.

## View your endpoints
<a name="sagemaker-view-endpoints"></a>

To view your endpoints in the **Endpoints** table, follow these steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the main menu, choose **Build**.

1. From the drop-down menu, choose **Inference endpoints**.

1. (Optional) To search for specific endpoints, enter text in **Search by endpoint name**.

# Pipelines
<a name="sagemaker-pipelines"></a>

Amazon SageMaker Unified Studio supports SageMaker AI Pipelines, a workflow orchestration service for automating machine learning (ML) development.

A pipeline defines a series of interconnected steps in a directed acyclic graph (DAG). You can define the steps using the Amazon SageMaker Unified Studio visual pipeline designer, or by creating a pipeline definition JSON schema. This DAG JSON definition gives information on the requirements and relationships between each step of your pipeline. 

The structure of a pipeline's DAG is determined by the data dependencies between steps. These data dependencies are created when the properties of a step's output are passed as the input to another step. 

**Note**  
To add a custom tag to a pipeline, add the prefix `ProjectUserTag` to the tag name. For example:  

```
ProjectUserTagMyCustomTag
```

For an overview of pipelines, see [Pipelines overview](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-overview.html) in the *Amazon SageMaker AI Developer Guide*.

## Pipeline actions
<a name="sagemaker-pipeline-actions"></a>

The following sections describe the actions available in Amazon SageMaker Unified Studio to create and manage pipelines:

**Topics**
+ [

### Define a pipeline
](#sagemaker-pipeline-define)
+ [

### Edit a pipeline
](#sagemaker-pipeline-edit)
+ [

### Run a pipeline
](#sagemaker-pipeline-run)
+ [

### Stop a pipeline execution
](#sagemaker-pipeline-stop)
+ [

### View the details of a pipeline
](#sagemaker-pipeline-details)
+ [

### View the details of a pipeline run
](#sagemaker-pipeline-run-details)
+ [

### Download a pipeline definition file
](#sagemaker-pipeline-download)

### Define a pipeline
<a name="sagemaker-pipeline-define"></a>

You define a pipeline using the visual pipeline designer. You can also create a [pipeline definition JSON schema](https://aws-sagemaker-mlops.github.io/sagemaker-model-building-pipeline-definition-JSON-schema/).

To define a pipeline using the visual pipeline designer, complete the following steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **ML Pipelines**. The system displays the pipelines for your project.

1. Choose **Create in visual editor**. The system opens the visual editor for creating a pipeline. You can also import a pipeline definition file from your computer.

1. Use the visual editor to add and connect pipeline steps.

1. Chose the **Save** to save your changes.

For more details, see [Define a pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/define-pipeline.html) in the *Amazon SageMaker AI Developer Guide*.

### Edit a pipeline
<a name="sagemaker-pipeline-edit"></a>

You can make changes to a pipeline before running it. To edit a pipeline, complete the following steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **ML Pipelines**. The system displays the pipelines for your project.

1. Choose the pipeline to edit.

1. Chose the **Executions** tab.

1. Choose the pipeline execution to edit.

1. Chose the **Visual editor** to edit the pipeline.

1. Chose the **Save** to save your changes.

For more details, see [Edit a pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/edit-pipeline-before-execution.html) in the *Amazon SageMaker AI Developer Guide*.

### Run a pipeline
<a name="sagemaker-pipeline-run"></a>

After defining the steps of your pipeline as a directed acyclic graph (DAG), you can run your pipeline, which executes the steps defined in your DAG. 

To run a pipeline, complete the following steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **ML Pipelines**. The system displays the pipelines for your project.

1. Choose the pipeline to run.

1. Chose **Execute**.

   1. For **Execution name**, enter a name for this run.

   1. (Optional) For **Description**, enter a description for this run.

1. Chose **Execute** to start the run.

For more details, see [Run a pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/run-pipeline.html) in the *Amazon SageMaker AI Developer Guide*.

### Stop a pipeline execution
<a name="sagemaker-pipeline-stop"></a>

To stop a pipeline, complete the following steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **ML Pipelines**. The system displays the pipelines for your project.

1. Choose the pipeline to stop.

1. Choose the **Executions** tab.

1. Choose the execution to stop.

1. Choose **Stop** to stop the execution. To resume the execution from where it was stopped, choose **Resume**. 

### View the details of a pipeline
<a name="sagemaker-pipeline-details"></a>

You can view the details of a pipeline to understand its parameters, the dependencies of its steps, or monitor its progress and status. 

To access the details of a given pipeline using Amazon SageMaker Unified Studio, complete the following steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **ML Pipelines**. The system displays the pipelines for your project.

1. Choose the pipeline to view its details.

1. Choose any of the following tabs to view these details:
   + **Executions** – Details about the executions.
   + **Graph** – The pipeline graph, including all steps.
   + **Parameters** – The run parameters and metrics related to the pipeline.
   + **Information** – The metadata associated with the pipeline, such as tags, the pipeline Amazon Resource Name (ARN), and role ARN. You can also edit the pipeline description from this location.

### View the details of a pipeline run
<a name="sagemaker-pipeline-run-details"></a>

You can view the details of a pipeline run, which can help you:
+ Identify and resolve problems that may have occurred during the run, such as failed steps or unexpected errors.
+ Compare the results of different pipeline executions to understand how changes in input data or parameters impact the overall workflow.
+ Identify bottlenecks and opportunities for optimization.

To view the details of a pipeline run, complete the following steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **ML Pipelines**. The system displays the pipelines for your project.

1. Choose the pipeline to view its details.

1. Choose the **Executions** tab.

1. Choose the pipeline execution to view. The pipeline graph for that execution appears.

1. Choose any of the pipeline steps in the graph to see step settings in the right sidebar.

1. Choose any of the following tabs to view these details:
   + **Graph** – The pipeline graph, including all steps.
   + **Parameters** – The run parameters and metrics related to the pipeline.
   + **Information** – The metadata associated with the pipeline, such as tags, the pipeline Amazon Resource Name (ARN), and role ARN. You can also edit the pipeline description from this location.

### Download a pipeline definition file
<a name="sagemaker-pipeline-download"></a>

You can download the definition file for your pipeline. You can use this pipeline definition file for:
+ Backup and restoration: Use the downloaded file to create a backup of your pipeline configuration, which you can restore in case of infrastructure failures or accidental changes.
+ Version control: Store the pipeline definition file in a source control system to track changes to the pipeline and revert to previous versions if needed.
+ Programmatic interactions: Use the pipeline definition file as input to the SDK or AWS CLI.
+ Integration with automation processes: Integrate the pipeline definition into your CI/CD workflows or other automation processes.

To download the definition file of a pipeline, complete the following steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **ML Pipelines**. The system displays the pipelines for your project.

1. Choose the pipeline. You can download the pipeline definition from this page or any of the execution pages.

1. At the top right of the page, choose the vertical ellipsis and choose **Download pipeline definition (JSON)**.

For more information about the pipeline actions, see [Pipelines actions](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-build.html) in the *Amazon SageMaker AI Developer Guide*.

# Model registry
<a name="sagemaker-register-models.xml"></a>

Use the model registry to catalog your models and manage model deployment to production.

You catalog models by creating model (package) groups that contain different versions of a model. You can create a model group that tracks all the models that you train to solve a particular problem. You can then register each model you train and the model registry adds it to the model group as a new model version. 

You can create categories of model groups by organizing them into collections. A typical workflow might include the following tasks:

1. Create a model group.

1. Create an ML pipeline that trains a model. For information about pipelines, see [Pipelines](sagemaker-pipelines.md). 

1. For each run of the ML pipeline, create a model version that you register in the model group you created in the first step.

1. Add your model group to one or more collections. 

For details about how to work with the model registry, see [ Model Registry, Model Versions, and Model Groups](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry-models.html) in the *Amazon SageMaker AI Developer Guide*.

## Create a model group
<a name="sagemaker-create-model-groups"></a>

A model group contains different versions of a model. Follow these steps to create a model group:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **Model Registry**. The model registry page displays the models that are registered to your project.

1. Choose **Model Groups**. The page displays the model groups that are defined for your project.

1. From the actions menu, choose **Create model group**. 

1. Provide a name for the model group. Optionally, you can add keys to the model group.

1. Choose **Register model group** to create the model group.

1. Confirm that your newly-created model appears in the list of model groups.

## Create a collection
<a name="sagemaker-create-collections"></a>

Follow these steps to create a collection:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **Model Registry**. The model registry page displays the models that are registered to your project.

1. Choose **Collections**. The collections page displays the collections that are defined for your project.

1. In the **Actions** drop-down menu, choose **Create collection**. 

1. Provide a name for the collection.

1. (Optional) To add model groups to the collection, complete these steps:

   1. Choose **Select model groups**.

   1. Select up to 10 model groups that you want to add. 

1. Choose **Create** to create the collection.

## Register a model version
<a name="sagemaker-register-models"></a>

The model registry is structured as several model (package) groups with model packages in each group. Each model package in a model group corresponds to a trained model. The version of each model package is a numerical value that starts at 1 and is incremented with each new model package added to a model group. The model packages used in the model registry are versioned, and **must** be associated with a model group.

Follow these steps to register a model version:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **Model Registry**. The model registry page displays the models that are registered to your project.

1. Choose **Register**.

1. From the **Register Model** page, Choose the type of model artifact:
   + JumpStart – Choose from the list of models.
   + Jobs – Choose a training job
   + Bring-your-model – Enter the location of your models

1. Choose **Register**.

1. Create a model group, or find an existing model group.

1. Choose **Register**.

For more information, see [ Model Registry](https://docs.aws.amazon.com/sagemaker/latest/dg/model-registry.html) in the *Amazon SageMaker AI Developer Guide*.

# Track experiments using MLflow
<a name="sagemaker-experiments.xml"></a>

Amazon SageMaker Unified Studio supports two options for tracking experiments with MLflow: MLflow Apps and MLflow Tracking Servers. MLflow Apps are the latest offering with faster startup times and cross-account sharing, while MLflow Tracking Servers provide traditional MLflow functionality.

For more information about MLflow, see [Machine learning experiments using MLflow](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow.html) in the *Amazon SageMaker AI Developer Guide*.

**Topics**
+ [

## Use MLflow Apps for experiment tracking
](#sagemaker-experiments-mlflow-apps)
+ [

## Use MLflow Tracking Servers for experiment tracking
](#sagemaker-experiments-tracking-servers)

## Use MLflow Apps for experiment tracking
<a name="sagemaker-experiments-mlflow-apps"></a>

Use MLflow Apps in Amazon SageMaker Unified Studio to track, manage, analyze, and compare machine learning experiments. MLflow Apps are the latest managed MLflow offering and provide faster startup times, cross-account sharing, and integration with other SageMaker AI features.

MLflow Apps should be preferred over existing MLflow Tracking Servers.

MLflow Apps provide capabilities for experiment tracking, model registry, and tracing generative AI applications. Each MLflow App includes compute resources, backend metadata storage, and artifact storage in Amazon S3.

**Note**  
MLflow Apps are different from MLflow Tracking Servers. MLflow Apps offer additional features such as faster startup time, cross-account sharing, and automatic model registration. For information about MLflow Tracking Servers, see [Use MLflow Tracking Servers for experiment tracking](#sagemaker-experiments-tracking-servers).

For more information about MLflow Apps, see [MLflow App Setup](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-app-setup.html) in the *Amazon SageMaker AI Developer Guide*.

**Topics**
+ [

### MLflow Apps overview
](#sagemaker-experiments-mlflow-apps-overview)
+ [

### Prerequisites
](#sagemaker-experiments-mlflow-apps-prerequisites)
+ [

### Create an MLflow App
](#sagemaker-experiments-mlflow-apps-create)
+ [

### Edit an MLflow App
](#sagemaker-experiments-mlflow-apps-edit)
+ [

### Delete an MLflow App
](#sagemaker-experiments-mlflow-apps-delete)
+ [

### Launch the MLflow UI
](#sagemaker-experiments-mlflow-apps-launch-ui)
+ [

### Integrate MLflow with your environment
](#sagemaker-experiments-mlflow-apps-integrate)

### MLflow Apps overview
<a name="sagemaker-experiments-mlflow-apps-overview"></a>

An MLflow App is a stand-alone HTTP server that serves multiple REST API endpoints for tracking runs and experiments. MLflow Apps provide the following capabilities:
+ Experiment tracking: Track parameters, metrics, and artifacts across multiple training runs to identify the best performing models.
+ Model registry: Manage model versions and catalog models for production deployment.
+ Tracing: Record inputs, outputs, and metadata at every step of a generative AI application to identify issues and maintain traceability.
+ Automatic model registration: Automatically register models from MLflow Model Registry to SageMaker AI Model Registry.

You can create MLflow Apps for your project. Your domain administrator can configure the project profile to automatically create an MLflow App during project creation, though this is not recommended due to default quota limits and increased project creation latency. We recommend creating MLflow Apps on demand after project creation.

When you delete a project, Amazon SageMaker Unified Studio automatically deletes associated MLflow Apps.

### Prerequisites
<a name="sagemaker-experiments-mlflow-apps-prerequisites"></a>

Before you create an MLflow App, ensure you have the following:
+ An Amazon S3 bucket in the same AWS Region as your project for artifact storage. The MLflow App uses this bucket to store model artifacts, images, and data files.
+ Appropriate IAM permissions to create and manage MLflow Apps. Your domain administrator configures these permissions through the following IAM policies:
  + SageMakerStudioProjectRoleMachineLearningPolicy
  + SageMakerStudioProjectProvisioningRolePolicy
  + SageMakerStudioUserIAMDefaultExecutionPolicy
  + SageMakerStudioAdminIAMDefaultExecutionPolicy

For more information about IAM permissions, see the Security chapter.

### Create an MLflow App
<a name="sagemaker-experiments-mlflow-apps-create"></a>

After you create a project, you can create an MLflow App for the project if it was not created automatically during project creation.

To create an MLflow App, perform the following steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the left menu, choose **Compute**.

1. From the tabs in the top banner, choose **MLflow**.

1. Choose **Create MLflow App**.

1. For **Name**, enter a name for the MLflow App. The name must start with a letter or number and can contain letters, numbers, and hyphens.

1. Choose **Create** to create the MLflow App.

**Note**  
It may take 2-3 minutes to complete MLflow App creation. When you successfully create an MLflow App, it automatically starts.

### Edit an MLflow App
<a name="sagemaker-experiments-mlflow-apps-edit"></a>

After you create an MLflow App, you can change the artifact storage location. To edit an MLflow App, perform the following steps:

1. From the left menu, choose **Compute**.

1. From the tabs in the top banner, choose **MLflow**.

1. From the **Actions** drop-down menu, choose **Edit**.

1. For **Artifact storage S3 path**, enter a new path to the artifact storage.

1. Choose **Save changes** to update the MLflow App.

### Delete an MLflow App
<a name="sagemaker-experiments-mlflow-apps-delete"></a>

You can delete an MLflow App when you no longer need it. Deleting an MLflow App removes the compute resources but does not delete the artifacts stored in Amazon S3.

To delete an MLflow App, perform the following steps:

1. From the left menu, choose **Compute**.

1. From the tabs in the top banner, choose **MLflow**.

1. From the **Actions** drop-down menu, choose **Delete**.

1. In the confirmation dialog, enter the MLflow App name to confirm deletion.

**Note**  
An MLflow App is not available for use while it is in certain states, such as creating, pending deletion, or other transitional states.
+ Choose **Delete** to delete the MLflow App.

**Important**  
Deleting an MLflow App is permanent and cannot be undone. Ensure you have backed up any important experiment data before deleting the MLflow App.

### Launch the MLflow UI
<a name="sagemaker-experiments-mlflow-apps-launch-ui"></a>

You can launch the MLflow UI to view and manage your experiments, models, and traces. To launch the MLflow UI, perform the following steps:

1. From the left menu, choose **Compute**.

1. From the tabs in the top banner, choose **MLflow**.

1. Choose the **Open** button next to the MLflow App. This action uses a presigned URL to launch the MLflow UI in a new tab in your current browser.

For more information about using the MLflow UI, see [Launch the MLflow UI using a presigned URL](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-launch-ui.html) in the *Amazon SageMaker AI Developer Guide*.

### Integrate MLflow with your environment
<a name="sagemaker-experiments-mlflow-apps-integrate"></a>

After you create an MLflow App, you can integrate it with your development environment to track experiments and log metrics.

To integrate MLflow with your environment, you need the MLflow App ARN. You can find the ARN on the MLflow App details page.

For detailed information about integrating MLflow with your environment, including code examples for Python notebooks, see [Integrate MLflow with your environment](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-track-experiments.html) in the *Amazon SageMaker AI Developer Guide*.

## Use MLflow Tracking Servers for experiment tracking
<a name="sagemaker-experiments-tracking-servers"></a>

Use MLflow Tracking Servers in Amazon SageMaker Unified Studio to track, manage, analyze, and compare machine learning experiments. MLflow Tracking Servers provide compute and storage resources for experiment tracking. Each project can have an MLflow Tracking Server. Your domain administrator can configure the project defaults to automatically create the MLflow Tracking Server during project creation. Otherwise, you can create an MLflow Tracking Server on demand for the project.

**Note**  
MLflow Tracking Servers are different from MLflow Apps. MLflow Apps offer additional features such as faster startup time, cross-account sharing, and automatic model registration. For information about MLflow Apps, see [Use MLflow Apps for experiment tracking](#sagemaker-experiments-mlflow-apps).

When you delete a project, Amazon SageMaker Unified Studio automatically deletes the tracking server.

For more information about MLflow Tracking Servers, see [MLflow Tracking Servers](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-create-tracking-server.html) in the *Amazon SageMaker AI Developer Guide*.

For more information about project profiles for AI-ML projects, see [Project profiles](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/project-profiles.html) in the *Amazon SageMaker Unified Studio Admin Guide*.

### Create an MLflow Tracking Server
<a name="sagemaker-experiments-tracking-servers-create"></a>

After you create a project, you can create an MLflow Tracking Server for the project, if it wasn't created automatically during project creation.

To create an MLflow Tracking Server, perform the following steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the top banner, choose your project from the projects drop-down menu, and choose **Project overview**.

1. From the left menu, choose **Compute**.

1. From the tabs in the top banner, choose **MLflow**.

1. Choose **Create MLflow Tracking Server**.

1. (Optional) Provide values to override the default values for the following fields:

   1. Name – enter a name for the server.

   1. Size – select a size for the server.

1. Choose **Create** to create the server.

### Edit an MLflow Tracking Server
<a name="sagemaker-experiments-tracking-servers-edit"></a>

After you create a tracking server, you can change the configured server size, if the current size isn't sufficient for the project.

To edit a tracking server, perform the following steps from your project's **MLflow** tab under **Compute**:

1. From the **Actions** drop-down menu, choose **Edit**. You can change the following values:

   1. Size – select a new size for the server.

   1. Artifact storage S3 path – enter a new path to the artifact storage.

1. Choose **Save changes** to update the tracking server.

### Start or stop an MLflow Tracking Server
<a name="sagemaker-experiments-tracking-servers-start-stop"></a>

You can stop a running server or start a stopped server. While the tracking server is starting or stopping, it's not available for MLflow to use.

To start or stop an MLflow tracking server, perform the following steps from your project's **Project details** page:

1. From the left menu, choose **Compute**.

1. From the tabs in the top banner, choose **MLflow**.

1. From the **Actions** drop-down menu, choose **Stop** to stop a running server. Choose **Start** to start a stopped server.

### Integrate MLflow with your environment
<a name="sagemaker-experiments-tracking-servers-integrate"></a>

For information about how to integrate MLflow with your environment, see [Integrate MLflow with your environment](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-track-experiments.html) in the *Amazon SageMaker AI Developer Guide*.

### Launch the MLflow UI
<a name="sagemaker-experiments-tracking-servers-launch-ui"></a>

You can launch the MLflow Tracking Server UI from the **MLflow** tab under **Compute**, by performing the following steps:

1. Navigate to the project details page for your project.

1. From the left menu, choose **Compute**.

1. From the tabs in the top banner, choose **MLflow**.

1. From the **Actions** drop-down menu, choose **Open MLflow**. This action uses a presigned URL to launch the MLflow UI in a new tab in your current browser.

For more information, see [Launch the MLflow UI using a presigned URL](https://docs.aws.amazon.com/sagemaker/latest/dg/mlflow-launch-ui.html) in the *Amazon SageMaker AI Developer Guide*.

# HyperPod clusters
<a name="sagemaker-hyperpods"></a>

Use Amazon SageMaker AI HyperPod to help you provision resilient compute clusters for running model training or fine-tuning workloads. Amazon SageMaker AI HyperPod integrates with Slurm or Amazon EKS for orchestration.

You can create HyperPod clusters using the Amazon SageMaker AI Hyperpod console UI or SageMaker AI Studio. For more information, see [ Orchestrating SageMaker AI HyperPod clusters with Slurm](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-slurm.html) or [ Orchestrating SageMaker AI HyperPod clusters with Amazon EKS](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-eks.html) in the *Amazon SageMaker AI Developer Guide*.

In Amazon SageMaker Unified Studio, you can launch machine learning workloads on Amazon SageMaker AI HyperPod clusters. You can also view details about the HyperPod clusters. 

**Topics**
+ [

## Connect to a HyperPod cluster
](#sagemaker-hyperpods-add-connection)
+ [

## View the HyperPod clusters
](#sagemaker-hyperpods-view)
+ [

## View details about a HyperPod cluster
](#sagemaker-hyperpods-view-details)
+ [

## HyperPod task governance
](#sagemaker-hyperpods-task-gov)
+ [

## Open the HyperPod in JupyterLab
](#sagemaker-hyperpods-jupyterlab)

## Connect to a HyperPod cluster
<a name="sagemaker-hyperpods-add-connection"></a>

To use a HyperPod cluster in Amazon SageMaker Unified Studio, you create a connection to the cluster by following these steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **HyperPod**. The compute page displays the HyperPod clusters for your project.

1. Choose **Add compute**.

1. In the **Add compute** form, configure the following fields:

   1. For **Connection name**, enter a name for this connection.

   1. For **HuperPod cluster name**, enter the name of the HyperPod cluster.

   1. For **Access role ARN**, enter the IAM role that the project needs to assume.

   1. For **Account ID**, enter the AWS account where the runtime role exists.

   1. For **AWS Region**, enter the Region where the HyperPod cluster was created.

## View the HyperPod clusters
<a name="sagemaker-hyperpods-view"></a>

To view the HyperPod clusters in your project, follow these steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Build** drop-down menu, choose **HyperPods**.

   The portal opens the **HyperPod clusters** tab of the **Compute** page. The HyperPod clusters table provides a summary view of each cluster, including the ARN, status, and creation time.

## View details about a HyperPod cluster
<a name="sagemaker-hyperpods-view-details"></a>

To view the details page for a HyperPod cluster, choose the HyperPod from the table of HyperPod clusters. The page displays tabs for tasks, metrics, settings, and metadata details. 

For more information about HyperPod cluster details that you can view in Amazon SageMaker Unified Studio, see [ HyperPod tabs in Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-studio-tabs.html) in the *Amazon SageMaker AI Developer Guide*.

## HyperPod task governance
<a name="sagemaker-hyperpods-task-gov"></a>

For Amazon EKS clusters, you can use HyperPod task governance to streamline resource allocation and utilization of compute resources in the cluster.

HyperPod task governance provides a comprehensive dashboard view of your Amazon EKS cluster utilization metrics, including hardware, team, and task metrics. 

For more information about the HyperPod dashboard view, see [ Dashboard](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-hyperpod-eks-operate-console-ui-governance-metrics.html) in the *Amazon SageMaker AI Developer Guide*.

## Open the HyperPod in JupyterLab
<a name="sagemaker-hyperpods-jupyterlab"></a>

To open your HyperPod in JupyterLab, follow these steps:

1. From the cluster details page, choose **Open in JupyterLab**.

   The **Starting space** page opens and the space initialization starts.

   After the JupyterLab space is ready, it opens the HyperPod sample notebook.

1. The HyperPod sample notebook shows the end-to-end flow of how to use the HyperPod cluster, including sample commands for:
   + Connecting to the cluster
   + Submitting jobs to the cluster.
   + Viewing job status or cluster status.

# Partner AI apps
<a name="sagemaker-partner-apps"></a>

Amazon SageMaker Unified Studio provides access to Amazon SageMaker AI Partner AI apps. Partner AI apps include generative AI and machine learning (ML) development applications that are built, published, and distributed by industry-leading application providers. 

 Amazon SageMaker AI Partner AI apps are full application stacks that include an Amazon EKS cluster and accompanying services such as Application Load Balancer, Amazon RDS, Amazon S3 buckets, or Amazon SQS queues.

To view the partner AI apps, complete these steps:

1. Sign in to Amazon SageMaker Unified Studio using the link that your administrator gave you.

1. From the **Explore** drop-down menu, choose **Partner AI apps**.

 For more information about SageMaker AI partner AI apps, see [ Partner AI Apps overview](https://docs.aws.amazon.com/sagemaker/latest/dg/partner-apps.html) in the *Amazon SageMaker AI Developer Guide*.

# Machine Learning Workflows in IAM-based domains
<a name="sagemaker-iam-based-domains"></a>

Amazon SageMaker Unified Studio provides a comprehensive machine learning environment within IAM-based domains that enables you to discover, deploy, and manage machine learning models through a unified interface.
+ Discover Foundation Models and Registered Models from multiple model providers, easily deploy them using sample notebooks
+ Customize foundation models using Jupyterlab IDE or Data Notebooks, a new serverless Notebook for ML Practitioners
+ Create and track Experiments and identify the best model for your usecase using MLflow
+ Use Agentic AI is Notebooks to easily create and train models
+ Monitor training jobs and model performance metrics
+ Manage model lifecycle through the integrated model registry

The machine learning capabilities in Amazon SageMaker Unified Studio integrate seamlessly with your project's IAM permissions and compute resources, providing secure access to models and deployment infrastructure within your domain's governance framework.

# Discover Foundation models
<a name="discover-foundation-models"></a>

1. Navigate to your Amazon SageMaker Unified Studio project using the URL provided by your administrator.

1. From the left navigation menu, choose **Models**.

1. Use the Model finder to search for models by task type:
   + Choose **Text** for natural language processing models
   + Choose **Image** for computer vision models
   + Choose **Audio** for speech and audio processing models
   + Choose **Multimodal** for models that handle multiple input types
   + Choose **Video** for video processing models

1. Browse Featured providers to explore models from specific providers.

1. Switch between Foundation Models and Registered Models tabs:
   + Foundation Models displays pre-trained models available for immediate use
   + Registered Models shows models you have registered in your project's model registry

1. Use the search functionality to find specific models by name or capability.

# Deploy Foundation models
<a name="deploy-foundation-models"></a>

1. From the Models page, select a Foundation Model you want to deploy.

1. On the model details page, review the model information including:
   + Model architecture and capabilities
   + Supported languages and input modalities
   + Training data information
   + License requirements

1. Choose **Deploy** to access deployment options. This will create a Sample notebook that you can use to review and deploy.

1. Review and execute the sample notebook content which demonstrates:
   + Model selection and configuration
   + Endpoint deployment procedures
   + Example inference code

1. Wait for the deployment to complete. The deployment process may take several minutes.

1. After successful model deployment, navigate to **Endpoints** from the left navigation menu.

1. View your deployed endpoints in the endpoints list, which displays:
   + Endpoint name and status
   + Creation and modification timestamps
   + Endpoint configuration details

1. Monitor endpoint status:
   + In Service indicates the endpoint is ready for inference requests
   + Creating shows the endpoint is being deployed
   + Failed indicates deployment issues that need attention

# Track experiments using MLflow
<a name="use-mlflow-experiments"></a>

Amazon SageMaker Unified Studio supports two options for tracking experiments with MLflow: MLflow Apps and MLflow Tracking Servers. MLflow Apps are the latest offering with faster startup times and cross-account sharing, while MLflow Tracking Servers provide traditional MLflow functionality.

**Topics**
+ [

## Use MLflow Apps for experiment tracking
](#use-mlflow-apps)
+ [

## Use MLflow Tracking Servers to track experiments
](#use-mlflow-tracking-servers)

## Use MLflow Apps for experiment tracking
<a name="use-mlflow-apps"></a>

**Note**  
MLflow Apps are different from MLflow Tracking Servers. MLflow Apps offer additional features such as faster startup time and cross-account sharing. For information about connecting to existing MLflow Tracking Servers, see [Use MLflow Tracking Servers to track experiments](#use-mlflow-tracking-servers).

MLflow Apps are the latest managed MLflow offering in Amazon SageMaker Unified Studio and provide faster startup times, cross-account sharing, and integration with SageMaker AI features. MLflow Apps use MLflow 3.0 and support experiment tracking, model registry, and tracing for generative AI applications.

### Connect to an MLflow App
<a name="connect-mlflow-app"></a>

You can connect to an existing MLflow App created in SageMaker AI Studio to track experiments and manage model versions. Note that you can't create a new MLflow App in Amazon SageMaker Unified Studio. You have to create this using SageMaker AI APIs or in SageMaker AI Studio.

To connect to an MLflow App, perform the following steps:

1. From your project's main page, choose **MLflow** from the left navigation menu.

1. Choose **Connect MLflow App**.

1. Enter an MLflow App Name

1. Provide a Connection name for identification

1. Enter the MLflow App ARN for your project

1. Choose **Connect to app**

### Manage MLflow Apps
<a name="manage-mlflow-apps"></a>

After you connect to an MLflow App, you can perform the following actions from the **MLflow** page:
+ Open MLflow – Choose the **Open** button next to the MLflow App to launch the MLflow UI and view experiments, models, and traces.
+ Edit – Update the connection with a new ARN.
+ Delete – Remove the connection to the MLflow App.

To access the **MLflow** page, choose **MLflow** from the left navigation menu.

## Use MLflow Tracking Servers to track experiments
<a name="use-mlflow-tracking-servers"></a>

To get started, you should have an existing MLflow server created in SageMaker AI Studio. Make sure that you have the ARN to get started.

1. From your project's main page, choose **MLflow** from the left navigation menu.

1. Connect to an existing MLflow Tracking Server. Note that you can't create a new MLflow Server in Amazon SageMaker Unified Studio. You have to create this using SageMaker AI APIs or in SageMaker AI Studio.

1. Choose **Connect Tracking Server**

1. Enter a Tracking Server Name

1. Provide a Connection name for identification

1. Enter the MLflow Tracking Server ARN for your project

1. Choose **Connect to server**

1. Once connected, choose **Open MLflow** to launch the MLflow UI.

1. In the MLflow interface, view your experiments:
   + Experiments tab shows all tracked experiments
   + Models tab displays registered model versions
   + Prompts tab contains prompt templates and versions

1. You can perform additional actions such as
   + Stop ML Server
   + Use server to train model – this will launch a sample notebook which will provide instructions on how to use MLflow to train a linear regression model
   + Edit the connection with new ARNs
   + Delete the connection

# Create and monitor custom training jobs
<a name="create-monitor-training-jobs"></a>

You can use Jupyter Notebooks or SageMaker AI Notebooks to train your ML jobs. Refer to SageMaker AI documents on how to train ML jobs.

```
model_trainer = ModelTrainer(
    training_image=image_uri,
    source_code=source_code,
    base_job_name=job_name,
    compute=compute_configs,
    distributed=Torchrun(),
    stopping_condition=StoppingCondition(
        max_runtime_in_seconds=7200
    ),
    hyperparameters={
        "config": "/opt/ml/input/data/config/args.yaml" # sample path
    },
    output_data_config=OutputDataConfig(
        s3_output_path=output_path
    ),
)
# starting the train job with our uploaded datasets as input
model_trainer.train(input_data_config=data, wait=True)
```

You can monitor the results of the training job by selecting **Training Jobs** on the left panel

You can stop the jobs, monitor the artifacts, hyper parameters, security configurations and tags that you have setup during the training.

# Manage Registered Models
<a name="manage-registered-models"></a>

Your registered models can be accessed and managed in Amazon SageMaker Unified Studio.

1. Navigate to **AI/ML** > **Models** and select the **Registered Models** tab to view the models.

1. You can review the following information
   + Model group name - Logical grouping for related model versions
   + Model version - Specific version identifier
   + Model artifacts - Location of model files in Amazon S3
   + Description - Optional description of the model and its purpose

1. Select specific models to review key model information like
   + Framework - Machine learning framework used (e.g., PyTorch, TensorFlow)
   + Algorithm - Algorithm or approach used for training
   + Performance metrics - Accuracy, precision, recall, or other relevant metrics

## Model lifecycle management
<a name="model-lifecycle-management"></a>

The model registry provides version control and lifecycle management:
+ Version tracking - Each model registration creates a new version with unique metadata
+ Approval workflows - Models can have approval status (Approved, Pending, Rejected)
+ Deployment status - Track which versions are deployed to endpoints
+ Model comparison - Compare metrics and metadata across versions

To manage model versions:

1. Choose a model group name to view all versions.

1. Review version details including:
   + Version number - Sequential version identifier
   + Description - Version-specific notes and changes
   + Deployment Status - Current deployment state
   + Approval Status - Workflow approval state
   + Modified Date - Last update timestamp

1. Use the **Actions** dropdown to:
   + Deploy using Notebooks - Use a pre-created notebook to deploy the model to SageMaker AI Endpoint
   + Deploy using Jupyter Notebook - Use a pre-created JupyterLab notebook to deploy the model to SageMaker AI Endpoint
   + Approve/Reject - Update approval status
   + Delete - Delete the selected version