

# Data import
<a name="canvas-importing-data"></a>

Amazon SageMaker Canvas supports importing tabular, image, and document data. You can import datasets from your local machine, Amazon services such as Amazon S3 and Amazon Redshift, and external data sources. When importing datasets from Amazon S3, you can bring a dataset of any size. Use the datasets that you import to build models and make predictions for other datasets.

Each use case for which you can build a custom model accepts different types of input. For example, if you want to build a single-label image classification model, then you should import image data. For more information about the different model types and the data they accept, see [How custom models work](canvas-build-model.md). You can import data and build custom models in SageMaker Canvas for the following data types:
+ **Tabular** (CSV, Parquet, or tables)
  + Categorical – Use categorical data to build custom categorical prediction models for 2 and 3\$1 category prediction.
  + Numeric – Use numeric data to build custom numeric prediction models.
  + Text – Use text data to build custom multi-category text prediction models.
  + Timeseries – Use timeseries data to build custom time series forecasting models.
+ **Image** (JPG or PNG) – Use image data to build custom single-label image prediction models.
+ **Document** (PDF, JPG, PNG, TIFF) – Document data is only supported for SageMaker Canvas Ready-to-use models. To learn more about Ready-to-use models that can make predictions for document data, see [Ready-to-use models](canvas-ready-to-use-models.md).

You can import data into Canvas from the following data sources:
+ Local files on your computer
+ Amazon S3 buckets
+ Amazon Redshift provisioned clusters (not Amazon Redshift Serverless)
+ AWS Glue Data Catalog through Amazon Athena
+ Amazon Aurora
+ Amazon Relational Database Service (Amazon RDS)
+ Salesforce Data Cloud
+ Snowflake
+ Databricks, SQLServer, MariaDB, and other popular databases through JDBC connectors
+ Over 40 external SaaS platforms, such as SAP OData

For a full list of data sources from which you can import, see the following table:


| Source | Type | Supported data types | 
| --- | --- | --- | 
| Local file upload | Local | Tabular, Image, Document | 
| Amazon Aurora | Amazon internal | Tabular | 
| Amazon S3 bucket | Amazon internal | Tabular, Image, Document | 
| Amazon RDS | Amazon internal | Tabular | 
| Amazon Redshift provisioned clusters (not Redshift Serverless) | Amazon internal | Tabular | 
| AWS Glue Data Catalog (through Amazon Athena) | Amazon internal | Tabular | 
| [Databricks](https://www.databricks.com/) | External | Tabular | 
| Snowflake | External | Tabular | 
| [Salesforce Data Cloud](https://www.salesforce.com/products/genie/overview/) | External | Tabular | 
| SQLServer | External | Tabular | 
| MySQL | External | Tabular | 
| PostgreSQL | External | Tabular | 
| MariaDB | External | Tabular | 
| [Amplitude](https://docs.aws.amazon.com/appflow/latest/userguide/amplitude.html) | External SaaS platform | Tabular | 
| [CircleCI](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-circleci.html) | External SaaS platform | Tabular | 
| [DocuSign Monitor](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-docusign-monitor.html) | External SaaS platform | Tabular | 
| [Domo](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-domo.html) | External SaaS platform | Tabular | 
| [Datadog](https://docs.aws.amazon.com/appflow/latest/userguide/datadog.html) | External SaaS platform | Tabular | 
| [Dynatrace](https://docs.aws.amazon.com/appflow/latest/userguide/dynatrace.html) | External SaaS platform | Tabular | 
| [Facebook Ads](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-facebook-ads.html) | External SaaS platform | Tabular | 
| [Facebook Page Insights](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-facebook-page-insights.html) | External SaaS platform | Tabular | 
| [Google Ads](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-google-ads.html) | External SaaS platform | Tabular | 
| [Google Analytics 4](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-google-analytics-4.html) | External SaaS platform | Tabular | 
| [Google Search Console](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-google-search-console.html) | External SaaS platform | Tabular | 
| [GitHub](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-github.html) | External SaaS platform | Tabular | 
| [GitLab](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-gitlab.html) | External SaaS platform | Tabular | 
| [Infor Nexus](https://docs.aws.amazon.com/appflow/latest/userguide/infor-nexus.html) | External SaaS platform | Tabular | 
| [Instagram Ads](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-instagram-ads.html) | External SaaS platform | Tabular | 
| [Jira Cloud](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-jira-cloud.html) | External SaaS platform | Tabular | 
| [LinkedIn Ads](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-linkedin-ads.html) | External SaaS platform | Tabular | 
| [LinkedIn Ads](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-linkedin-ads.html) | External SaaS platform | Tabular | 
| [Mailchimp](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-mailchimp.html) | External SaaS platform | Tabular | 
| [Marketo](https://docs.aws.amazon.com/appflow/latest/userguide/marketo.html) | External SaaS platform | Tabular | 
| [Microsoft Teams](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-microsoft-teams.html) | External SaaS platform | Tabular | 
| [Mixpanel](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-mixpanel.html) | External SaaS platform | Tabular | 
| [Okta](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-okta.html) | External SaaS platform | Tabular | 
| [Salesforce](https://docs.aws.amazon.com/appflow/latest/userguide/salesforce.html) | External SaaS platform | Tabular | 
| [Salesforce Marketing Cloud](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-salesforce-marketing-cloud.html) | External SaaS platform | Tabular | 
| [Salesforce Pardot](https://docs.aws.amazon.com/appflow/latest/userguide/pardot.html) | External SaaS platform | Tabular | 
| [SAP OData](https://docs.aws.amazon.com/appflow/latest/userguide/sapodata.html) | External SaaS platform | Tabular | 
| [SendGrid](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-sendgrid.html) | External SaaS platform | Tabular | 
| [ServiceNow](https://docs.aws.amazon.com/appflow/latest/userguide/servicenow.html) | External SaaS platform | Tabular | 
| [Singular](https://docs.aws.amazon.com/appflow/latest/userguide/singular.html) | External SaaS platform | Tabular | 
| [Slack](https://docs.aws.amazon.com/appflow/latest/userguide/slack.html) | External SaaS platform | Tabular | 
| [Stripe](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-stripe.html) | External SaaS platform | Tabular | 
| [Trend Micro](https://docs.aws.amazon.com/appflow/latest/userguide/trend-micro.html) | External SaaS platform | Tabular | 
| [Typeform](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-typeform.html) | External SaaS platform | Tabular | 
| [Veeva](https://docs.aws.amazon.com/appflow/latest/userguide/veeva.html) | External SaaS platform | Tabular | 
| [Zendesk](https://docs.aws.amazon.com/appflow/latest/userguide/zendesk.html) | External SaaS platform | Tabular | 
| [Zendesk Chat](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-zendesk-chat.html) | External SaaS platform | Tabular | 
| [Zendesk Sell](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-zendesk-sell.html) | External SaaS platform | Tabular | 
| [Zendesk Sunshine](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-zendesk-sunshine.html) | External SaaS platform | Tabular | 
| [Zoom Meetings](https://docs.aws.amazon.com/appflow/latest/userguide/connectors-zoom.html) | External SaaS platform | Tabular | 

For instructions on how to import data and information regarding input data requirements, such as the maximum file size for images, see [Create a dataset](canvas-import-dataset.md).

Canvas also provides several sample datasets in your application to help you get started. To learn more about the SageMaker AI-provided sample datasets you can experiment with, see [Use sample datasets](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-sample-datasets.html).

After you import a dataset into Canvas, you can update the dataset at any time. You can do a manual update or you can set up a schedule for automatic dataset updates. For more information, see [Update a dataset](canvas-update-dataset.md).

For more information specific to each dataset type, see the following sections:

**Tabular**

To import data from an external data source (such as a Snowflake database or a SaaS platform), you must authenticate and connect to the data source in the Canvas application. For more information, see [Connect to data sources](canvas-connecting-external.md).

If you want to import datasets larger than 5 GB from Amazon S3 into Canvas, you can achieve faster sampling by using Amazon Athena to query and sample the data from Amazon S3.

After creating datasets in Canvas, you can prepare and transform your data using the data preparation functionality of Data Wrangler. You can use Data Wrangler to handle missing values, transform your features, join multiple datasets into a single dataset, and more. For more information, see [Data preparation](canvas-data-prep.md).

**Tip**  
As long as your data is arranged into tables, you can join datasets from various sources, such as Amazon Redshift, Amazon Athena, or Snowflake.

**Image**

For information about how to edit an image dataset and perform tasks such as assigning or reassigning labels, adding images, or deleting images, see [Edit an image dataset](canvas-edit-image.md).

# Create a dataset
<a name="canvas-import-dataset"></a>

**Note**  
If you're importing datasets larger than 5 GB into Amazon SageMaker Canvas, we recommend that you use the [Data Wrangler feature](canvas-data-prep.md) in Canvas to create a data flow. Data Wrangler supports advanced data preparation features such as [joining](canvas-transform.md#canvas-transform-join) and [concatenating](canvas-transform.md#canvas-transform-concatenate) data. After you create a data flow, you can export your data flow as a Canvas dataset and begin building a model. For more information, see [Export to create a model](canvas-processing-export-model.md).

The following sections describe how to create a dataset in Amazon SageMaker Canvas. For custom models, you can create datasets for tabular and image data. For Ready-to-use models, you can use tabular and image datasets as well as document datasets. Choose your workflow based on the following information:
+ For categorical, numeric, text, and timeseries data, see [Import tabular data](#canvas-import-dataset-tabular).
+ For image data, see [Import image data](#canvas-import-dataset-image).
+ For document data, see [Import document data](#canvas-ready-to-use-import-document).

A dataset can consist of multiple files. For example, you might have multiple files of inventory data in CSV format. You can upload these files together as a dataset as long as the schema (or column names and data types) of the files match.

Canvas also supports managing multiple versions of your dataset. When you create a dataset, the first version is labeled as `V1`. You can create a new version of your dataset by updating your dataset. You can do a manual update, or you can set up an automated schedule for updating your dataset with new data. For more information, see [Update a dataset](canvas-update-dataset.md).

When you import your data into Canvas, make sure that it meets the requirements in the following table. The limitations are specific to the type of model you’re building.


| Limit | 2 category, 3\$1 category, numeric, and time series models | Text prediction models | Image prediction models | \$1Document data for Ready-to-use models | 
| --- | --- | --- | --- | --- | 
| Supported file types |  CSV and Parquet (local upload, Amazon S3, or databases) JSON (databases)  |  CSV and Parquet (local upload, Amazon S3, or databases) JSON (databases)  | JPG, PNG | PDF, JPG, PNG, TIFF | 
| Maximum file size |  Local upload: 5 GB Data sources: PBs  |  Local upload: 5 GB Data sources: PBs  | 30 MB per image | 5 MB per document | 
| Maximum number of files you can upload at a time | 30 | 30 | N/A | N/A | 
| Maximum number of columns | 1,000 | 1,000 | N/A | N/A | 
| Maximum number of entries (rows, images, or documents) for **Quick builds** | N/A | 7500 rows | 5000 images | N/A | 
| Maximum number of entries (rows, images, or documents) for **Standard builds** | N/A | 150,000 rows | 180,000 images | N/A | 
| Minimum number of entries (rows) for **Quick builds** |  2 category: 500 rows 3\$1 category, numeric, time series: N/A  | N/A | N/A | N/A | 
| Minimum number of entries (rows, images, or documents) for **Standard builds** | 250 rows | 50 rows | 50 images | N/A | 
|  Minimum number of entries (rows or images) per label | N/A | 25 rows | 25 rows | N/A | 
| Minimum number of labels |  2 category: 2 3\$1 category: 3 Numeric, time series: N/A  | 2 | 2 | N/A | 
|  Minimum sample size for random sampling | 500 | N/A | N/A | N/A | 
|  Maximum sample size for random sampling | 200,000 | N/A | N/A | N/A | 
| Maximum number of labels |  2 category: 2 3\$1 category, numeric, time series: N/A  | 1000 | 1000 | N/A | 

\$1Document data is currently only supported for [Ready-to-use models](canvas-ready-to-use-models.md) that accept document data. You can't build a custom model with document data.

Also note the following restrictions:
+ When importing data from an Amazon S3 bucket, make sure that your Amazon S3 bucket name doesn't contain a `.`. If your bucket name contains a `.`, you might experience errors when trying to import data into Canvas.
+ For tabular data, Canvas disallows selecting any file with extensions other than .csv, .parquet, .parq, and .pqt for both local upload and Amazon S3 import. CSV files can use any common or custom delimiter, and they must not have newline characters except when denoting a new row.
+ For tabular data using Parquet files, note the following:
  + Parquet files can't include complex types like maps and lists.
  + The column names of Parquet files can't contain spaces.
  + If using compression, Parquet files must use either gzip or snappy compression types. For more information about the preceding compression types, see the [gzip documentation](https://www.gzip.org/) and the [snappy documentation](https://github.com/google/snappy).
+ For image data, if you have any unlabeled images, you must label them before building your model. For information about how to assign labels to images within the Canvas application, see [Edit an image dataset](canvas-edit-image.md).
+ If you set up automatic dataset updates or automatic batch prediction configurations, you can only create a total of 20 configurations in your Canvas application. For more information, see [How to manage automations](canvas-manage-automations.md).

After you import a dataset, you can view your datasets on the **Datasets** page at any time.

## Import tabular data
<a name="canvas-import-dataset-tabular"></a>

With tabular datasets, you can build categorical, numeric, time series forecasting, and text prediction models. Review the limitations table in the preceding **Import a dataset** section to ensure that your data meets the requirements for tabular data.

Use the following procedure to import a tabular dataset into Canvas:

1. Open your SageMaker Canvas application.

1. In the left navigation pane, choose **Datasets**.

1. Choose **Import data**.

1. From the dropdown menu, choose **Tabular**.

1. In the popup dialog box, in the **Dataset name** field, enter a name for the dataset and choose **Create**.

1. On the **Create tabular dataset** page, open the **Data Source** dropdown menu.

1. Choose your data source:
   + To upload files from your computer, choose **Local upload**.
   + To import data from another source, such as an Amazon S3 bucket or a Snowflake database, search for your data source in the **Search data source bar**. Then, choose the tile for your desired data source.
**Note**  
You can only import data from the tiles that have an active connection. If you want to connect to a data source that is unavailable to you, contact your administrator. If you’re an administrator, see [Connect to data sources](canvas-connecting-external.md).

   The following screenshot shows the **Data Source** dropdown menu.  
![\[Screenshot showing the Data Source dropdown menu and a search for a data source in the search bar.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/import-data-choose-source.png)

1. (Optional) If you’re connecting to an Amazon Redshift or Snowflake database for the first time, a dialog box appears to create a connection. Fill out the dialog box with your credentials and choose **Create connection**. If you already have a connection, choose your connection.

1. From your data source, select your files to import. For local upload and importing from Amazon S3, you can select files. For Amazon S3 only, you also have the option to directly enter the S3 URI, alias, or ARN of your bucket or S3 access point in the **Input S3 endpoint** field, and then choose files to import. For database sources, you can drag-and-drop data tables from the left navigation pane.

1. (Optional) For tabular data sources that support SQL querying (such as Amazon Redshift, Amazon Athena, or Snowflake), you can choose **Edit in SQL** to make SQL queries before importing them.

   The following screenshot shows the **Edit SQL** view for an Amazon Athena data source.  
![\[Screenshot showing a SQL query in the Edit SQL view for Amazon Athena data.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/import-data-edit-sql.png)

1. Choose **Preview dataset** to preview your data before importing it.

1. In the **Import settings**, enter a **Dataset name** or use the default dataset name.

1. (Optional) For data that you import from Amazon S3, you are shown the **Advanced** settings and can fill out the following fields:

   1. Toggle the **Use first row as header** option on if you want to use the first row of your dataset as the column names. If you selected multiple files, this applies to each file.

   1. If you're importing a CSV file, for the **File encoding (CSV)** dropdown, select your dataset file’s encoding. `UTF-8` is the default.

   1. For the **Delimiter** dropdown, select the delimiter that separates each cell in your data. The default delimiter is `,`. You can also specify a custom delimiter.

   1. Select **Multi-line detection** if you’d like Canvas to manually parse your entire dataset for multi-line cells. By default, this option is not selected and Canvas determines whether or not to use multi-line support by taking a sample of your data. However, Canvas might not detect any multi-line cells in the sample. If you have multi-line cells, we recommend that you select the **Multi-line detection** option to force Canvas to check your entire dataset for multi-line cells.

1. When you’re ready to import your data, choose **Create dataset**.

While your dataset is importing into Canvas, you can see your datasets listed on the **Datasets** page. From this page, you can [View your dataset details](#canvas-view-dataset-details).

When the **Status** of your dataset shows as `Ready`, Canvas successfully imported your data and you can proceed with [building a model](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-build-model.html).

If you have a connection to a data source, such as an Amazon Redshift database or a SaaS connector, you can return to that connection. For Amazon Redshift and Snowflake, you can add another connection by creating another dataset, returning to the **Import data** page, and choosing the **Data Source** tile for that connection. From the dropdown menu, you can open the previous connection or choose **Add connection**.

**Note**  
For SaaS platforms, you can only have one connection per data source.

## Import image data
<a name="canvas-import-dataset-image"></a>

With image datasets, you can build single-label image prediction custom models, which predict a label for an image. Review the limitations in the preceding **Import a dataset** section to ensure that your image dataset meets the requirements for image data.

**Note**  
You can only import image datasets from local file upload or an Amazon S3 bucket. Also, for image datasets, you must have at least 25 images per label.

Use the following procedure to import an image dataset into Canvas:

1. Open your SageMaker Canvas application.

1. In the left navigation pane, choose **Datasets**.

1. Choose **Import data**.

1. From the dropdown menu, choose **Image**.

1. In the popup dialog box, in the **Dataset name** field, enter a name for the dataset and choose **Create**.

1. On the **Import** page, open the **Data Source** dropdown menu.

1. Choose your data source. To upload files from your computer, choose **Local upload**. To import files from Amazon S3, choose **Amazon S3**.

1. From your computer or Amazon S3 bucket, select the images or folders of images that you want to upload.

1. When you’re ready to import your data, choose **Import data**.

While your dataset is importing into Canvas, you can see your datasets listed on the **Datasets** page. From this page, you can [View your dataset details](#canvas-view-dataset-details).

When the **Status** of your dataset shows as `Ready`, Canvas successfully imported your data and you can proceed with [building a model](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-build-model.html).

When you are building your model, you can edit your image dataset, and you can assign or re-assign labels, add images, or delete images from your dataset. For more information about how to edit your image dataset, see [Edit an image dataset](canvas-edit-image.md).

## Import document data
<a name="canvas-ready-to-use-import-document"></a>

The Ready-to-use models for expense analysis, identity document analysis, document analysis, and document queries support document data. You can’t build a custom model with document data.

With document datasets, you can generate predictions for expense analysis, identity document analysis, document analysis, and document queries Ready-to-use models. Review the limitations table in the [Create a dataset](#canvas-import-dataset) section to ensure that your document dataset meets the requirements for document data.

**Note**  
You can only import document datasets from local file upload or an Amazon S3 bucket.

Use the following procedure to import a document dataset into Canvas:

1. Open your SageMaker Canvas application.

1. In the left navigation pane, choose **Datasets**.

1. Choose **Import data**.

1. From the dropdown menu, choose **Document**.

1. In the popup dialog box, in the **Dataset name** field, enter a name for the dataset and choose **Create**.

1. On the **Import** page, open the **Data Source** dropdown menu.

1. Choose your data source. To upload files from your computer, choose **Local upload**. To import files from Amazon S3, choose **Amazon S3**.

1. From your computer or Amazon S3 bucket, select the document files that you want to upload.

1. When you’re ready to import your data, choose **Import data**.

While your dataset is importing into Canvas, you can see your datasets listed on the **Datasets** page. From this page, you can [View your dataset details](#canvas-view-dataset-details).

When the **Status** of your dataset shows as `Ready`, Canvas has successfully imported your data.

On the **Datasets** page, you can choose your dataset to preview it, which shows you up to the first 100 documents of your dataset.

## View your dataset details
<a name="canvas-view-dataset-details"></a>



For each of your datasets, you can view all of the files in a dataset, the dataset’s version history, and any auto update configurations for the dataset. From the **Datasets** page, you can also initiate actions such as [Update a dataset](canvas-update-dataset.md) or [How custom models work](canvas-build-model.md).

To view the details for a dataset, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **Datasets**.

1. From the list of datasets, choose your dataset.

On the **Data** tab, you can see a preview of your data. If you choose **Dataset details**, you can see all of the files that are part of your dataset. Choose a file to see only the data from that file in the preview. For image datasets, the preview only shows you the first 100 images of your dataset.

On the **Version history** tab, you can see a list of all of the versions of your dataset. A new version is made whenever you update a dataset. To learn more about updating a dataset, see [Update a dataset](canvas-update-dataset.md). The following screenshot shows the **Version history** tab in the Canvas application.

![\[Screenshot of the Version history tab for a dataset, with a list of dataset versions.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-version-history.png)


On the **Auto updates** tab, you can enable auto updates for the dataset and set up a configuration to update your dataset on a regular schedule. To learn more about setting up auto updates for a dataset, see [Configure automatic updates for a dataset](canvas-update-dataset-auto.md). The following screenshot shows the **Auto updates** tab with auto updates turned on and a list of auto update jobs that have been performed on the dataset.

![\[The Auto updates tab for dataset showing the auto updates turned on and a list of auto update jobs.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-auto-updates.png)


# Update a dataset
<a name="canvas-update-dataset"></a>

After importing your initial dataset into Amazon SageMaker Canvas, you might have additional data that you want to add to your dataset. For example, you might get inventory data at the end of every week that you want to add to your dataset. Instead of importing your data multiple times, you can update your existing dataset and add or remove files from it.

**Note**  
You can only update datasets that you have imported through local upload or Amazon S3.

You can update your dataset either manually or automatically. For more information about automatic dataset updates, see [Configure automatic updates for a dataset](canvas-update-dataset-auto.md).

Every time you update your dataset, Canvas creates a new version of your dataset. You can only use the latest version of your dataset to build a model or generate predictions. For more information about viewing the version history of your dataset, see [View your dataset details](canvas-import-dataset.md#canvas-view-dataset-details).

You can also use dataset updates with automated batch predictions, which starts a batch prediction job whenever you update your dataset. For more information, see [Batch predictions in SageMaker Canvas](canvas-make-predictions-batch.md).

The following section describes how to do manual updates to your dataset.

## Manually update a dataset
<a name="canvas-update-dataset-manual"></a>

To do a manual update, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **Datasets**.

1. From the list of datasets, choose the dataset you want to update.

1. Choose the **Update dataset** dropdown menu and choose **Manual update**. You are taken to the import data workflow.

1. From the **Data source** dropdown menu, choose either **Local upload** or **Amazon S3**.

1. The page shows you a preview of your data. From here, you can add or remove files from the dataset. If you’re importing tabular data, the schema of the new files (column names and data types) must match the schema of the existing files. Additionally, your new files must not exceed the maximum dataset size or file size. For more information about these limitations, see [ Import a dataset](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-import-dataset.html).
**Note**  
If you add a file with the same name as an existing file in your dataset, the new file overwrites the old version of the file.

1. When you’re ready to save your changes, choose **Update dataset**.

You should now have a new version of your dataset.

On the **Datasets** page, you can choose the **Version history** tab to see all of the versions of your dataset and the history of both manual and automatic updates you’ve made.

# Configure automatic updates for a dataset
<a name="canvas-update-dataset-auto"></a>

After importing your initial dataset into Amazon SageMaker Canvas, you might have additional data that you want to add to your dataset. For example, you might get inventory data at the end of every week that you want to add to your dataset. Instead of importing your data multiple times, you can update your existing dataset and add or remove files from it.

**Note**  
You can only update datasets that you have imported through local upload or Amazon S3.

With automatic dataset updates, you specify a location where Canvas checks for files at a frequency you specify. If you import new files during the update, the schema of the files must match the existing dataset exactly.

Every time you update your dataset, Canvas creates a new version of your dataset. You can only use the latest version of your dataset to build a model or generate predictions. For more information about viewing the version history of your dataset, see [View your dataset details](canvas-import-dataset.md#canvas-view-dataset-details).

You can also use dataset updates with automated batch predictions, which starts a batch prediction job whenever you update your dataset. For more information, see [Batch predictions in SageMaker Canvas](canvas-make-predictions-batch.md).

The following section describes how to do automatic updates to your dataset.

An automatic update is when you set up a configuration for Canvas to update your dataset at a given frequency. We recommend that you use this option if you regularly receive new files of data that you want to add to your dataset.

When you set up the auto update configuration, you specify an Amazon S3 location where you upload your files and a frequency at which Canvas checks the location and imports files. Each instance of Canvas updating your dataset is referred to as a *job*. For each job, Canvas imports all of the files in the Amazon S3 location. If you have new files with the same names as existing files in your dataset, Canvas overwrites the old files with the new files.

For automatic dataset updates, Canvas doesn’t perform schema validation. If the schema of files imported during an automatic update don’t match the schema of the existing files or exceed the size limitations (see [Import a dataset](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-import-dataset.html) for a table of file size limitations), then you get errors when your jobs run.

**Note**  
You can only set up a maximum of 20 automatic configurations in your Canvas application. Additionally, Canvas only does automatic updates while you’re logged in to your Canvas application. If you log out of your Canvas application, automatic updates pause until you log back in.

To configure automatic updates for your dataset, do the following:

1. Open the SageMaker Canvas application.

1. In the left navigation pane, choose **Datasets**.

1. From the list of datasets, choose the dataset you want to update.

1. Choose the **Update dataset** dropdown menu and choose **Automatic update**. You are taken to the **Auto updates**tab for the dataset.

1. Turn on the **Auto update enabled** toggle.

1. For **Specify a data source**, enter the Amazon S3 path to a folder where you plan to regularly upload files.

1. For **Choose a frequency**, select **Hourly**, **Weekly**, or **Daily**.

1. For **Specify a starting time**, use the calendar and time picker to select when you want the first auto update job to start.

1. When you’re ready to create the auto update configuration, choose **Save**.

Canvas begins the first job of your auto update cadence at the specified starting time.

# View your automatic dataset update jobs
<a name="canvas-update-dataset-auto-view"></a>

To view the job history for your automatic dataset updates in Amazon SageMaker Canvas, on your dataset details page, choose the **Auto updates** tab.

Each automatic update to a dataset shows as a job in the **Auto updates** tab under the **Job history** section. For each job, you can see the following:
+ **Job created** – The timestamp for when Canvas started updating the dataset.
+ **Files** – The number of files in the dataset.
+ **Cells (Columns x Rows)** – The number of columns and rows in the dataset.
+ **Status** – The status of the dataset after the update. If the job was successful, the status is **Ready**. If the job failed for any reason, the status is **Failed**, and you can hover over the status for more details.

# Edit your automatic dataset update configuration
<a name="canvas-update-dataset-auto-edit"></a>

You might want to make changes to your auto update configuration for a dataset, such as changing the frequency of the updates. You might also want to turn off your automatic update configuration to pause the updates to your dataset.

To make changes to your auto update configuration for a dataset, go to the **Auto updates** tab of your dataset and choose **Edit** to make changes to the configuration.

To pause your dataset updates, turn off your automatic configuration. You can turn off auto updates by going to the **Auto updates** tab of your dataset and turning the **Enable auto updates** toggle off. You can turn this toggle back on at any time to resume the update schedule.

To learn how to delete your configuration, see [Delete an automatic configuration](canvas-manage-automations-delete.md).

# Connect to data sources
<a name="canvas-connecting-external"></a>

In Amazon SageMaker Canvas, you can import data from a location outside of your local file system through an AWS service, a SaaS platform, or other databases using JDBC connectors. For example, you might want to import tables from a data warehouse in Amazon Redshift, or you might want to import Google Analytics data.

When you go through the **Import** workflow to import data in the Canvas application, you can choose your data source and then select the data that you want to import. For certain data sources, like Snowflake and Amazon Redshift, you must specify your credentials and add a connection to the data source.

The following screenshot shows the data sources toolbar in the **Import** workflow, with all of the available data sources highlighted. You can only import data from the data sources that are available to you. Contact your administrator if your desired data source isn’t available.

![\[The Data Source dropdown menu on the Import data page in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/data-sources.png)


The following sections provide information about establishing connections to external data sources and and importing data from them. Review the following section first to determine what permissions you need to import data from your data source.

## Permissions
<a name="canvas-connecting-external-permissions"></a>

Review the following information to ensure that you have the necessary permissions to import data from your data source:
+ **Amazon S3:** You can import data from any Amazon S3 bucket as long as your user has permissions to access the bucket. For more information about using AWS IAM to control access to Amazon S3 buckets, see [Identity and access management in Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-access-control.html) in the *Amazon S3 User Guide*.
+ **Amazon Athena:** If you have the [AmazonSageMakerFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerFullAccess.html) policy and the [AmazonSageMakerCanvasFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerCanvasFullAccess.html) policy attached to your user’s execution role, then you can query your AWS Glue Data Catalog with Amazon Athena. If you’re part of an Athena workgroup, make sure that the Canvas user has permissions to run Athena queries on the data. For more information, see [Using workgroups for running queries](https://docs.aws.amazon.com/athena/latest/ug/workgroups.html) in the *Amazon Athena User Guide*.
+ **Amazon DocumentDB:** You can import data from any Amazon DocumentDB database as long as you have the credentials (username and password) to connect to the database and have the minimum base Canvas permissions attached to your user’s execution role. For more information about Canvas permissions, see the [Prerequisites for setting up Amazon SageMaker Canvas](canvas-getting-started.md#canvas-prerequisites).
+ **Amazon Redshift:** To give yourself the necessary permissions to import data from Amazon Redshift, see [Grant Users Permissions to Import Amazon Redshift Data](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-redshift-permissions.html).
+ **Amazon RDS:** If you have the [AmazonSageMakerCanvasFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerCanvasFullAccess.html) policy attached to your user’s execution role, then you’ll be able to access your Amazon RDS databases from Canvas.
+ **SaaS platforms:** If you have the [AmazonSageMakerFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerFullAccess.html) policy and the [AmazonSageMakerCanvasFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerCanvasFullAccess.html) policy attached to your user’s execution role, then you have the necessary permissions to import data from SaaS platforms. See [Use SaaS connectors with Canvas](#canvas-connecting-external-appflow) for more information about connecting to a specific SaaS connector.
+ **JDBC connectors:** For database sources such as Databricks, MySQL or MariaDB, you must enable username and password authentication on the source database before attempting to connect from Canvas. If you’re connecting to a Databricks database, you must have the JDBC URL that contains the necessary credentials.

## Connect to a database stored in AWS
<a name="canvas-connecting-internal-database"></a>

You might want to import data that you’ve stored in AWS. You can import data from Amazon S3, use Amazon Athena to query a database in the AWS Glue Data Catalog, import data from [Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Welcome.html), or make a connection to a provisioned Amazon Redshift database (not Redshift Serverless).

You can create multiple connections to Amazon Redshift. For Amazon Athena, you can access any databases that you have in your [AWS Glue Data Catalog](https://docs.aws.amazon.com/prescriptive-guidance/latest/serverless-etl-aws-glue/aws-glue-data-catalog.html). For Amazon S3, you can import data from a bucket as long as you have the necessary permissions.

Review the following sections for more detailed information.

### Connect to data in Amazon S3, Amazon Athena, or Amazon RDS
<a name="canvas-connecting-internal-database-s3-athena"></a>

For Amazon S3, you can import data from an Amazon S3 bucket as long as you have permissions to access the bucket.

For Amazon Athena, you can access databases in your AWS Glue Data Catalog as long as you have permissions through your [Amazon Athena workgroup](https://docs.aws.amazon.com/athena/latest/ug/manage-queries-control-costs-with-workgroups.html).

For Amazon RDS, if you have the [AmazonSageMakerCanvasFullAccess](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/AmazonSageMakerCanvasFullAccess.html) policy attached to your user’s role, then you’ll be able to import data from your Amazon RDS databases into Canvas.

To import data from an Amazon S3 bucket, or to run queries and import data tables with Amazon Athena, see [Create a dataset](canvas-import-dataset.md). You can only import tabular data from Amazon Athena, and you can import tabular and image data from Amazon S3.

### Connect to an Amazon DocumentDB database
<a name="canvas-connecting-docdb"></a>

Amazon DocumentDB is a fully managed, serverless, document database service. You can import unstructured document data stored in an Amazon DocumentDB database into SageMaker Canvas as a tabular dataset, and then you can build machine learning models with the data.

**Important**  
Your SageMaker AI domain must be configured in **VPC only** mode to add connections to Amazon DocumentDB. You can only access Amazon DocumentDB clusters in the same Amazon VPC as your Canvas application. Additionally, Canvas can only connect to TLS-enabled Amazon DocumentDB clusters. For more information about how to set up Canvas in **VPC only** mode, see [Configure Amazon SageMaker Canvas in a VPC without internet access](canvas-vpc.md).

To import data from Amazon DocumentDB databases, you must have credentials to access the Amazon DocumentDB database and specify the username and password when creating a database connection. You can configure more granular permissions and restrict access by modifying the Amazon DocumentDB user permissions. To learn more about access control in Amazon DocumentDB, see [Database Access Using Role-Based Access Control](https://docs.aws.amazon.com/documentdb/latest/developerguide/role_based_access_control.html) in the *Amazon DocumentDB Developer Guide*.

When you import from Amazon DocumentDB, Canvas converts your unstructured data into a tabular dataset by mapping the fields to columns in a table. Additional tables are created for each complex field (or nested structure) in the data, where the columns correspond to the sub-fields of the complex field. For more detailed information about this process and examples of schema conversion, see the [ Amazon DocumentDB JDBC Driver Schema Discovery](https://github.com/aws/amazon-documentdb-jdbc-driver/blob/develop/src/markdown/schema/schema-discovery.md) GitHub page.

Canvas can only make a connection to a single database in Amazon DocumentDB. To import data from a different database, you must create a new connection.

You can import data from Amazon DocumentDB into Canvas by using the following methods:
+ [Create a dataset](canvas-import-dataset.md). You can import your Amazon DocumentDB data and create a tabular dataset in Canvas. If you choose this method, make sure that you follow the [ Import tabular data](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-import-dataset.html#canvas-import-dataset-tabular) procedure.
+ [Create a data flow](canvas-data-flow.md). You can create a data preparation pipeline in Canvas and add your Amazon DocumentDB database as a data source.

To proceed with importing your data, follow the procedure for one of the methods linked in the preceding list.

When you reach the step in either workflow to choose a data source (Step 6 for creating a dataset, or Step 8 for creating a data flow), do the following:

1. For **Data Source**, open the dropdown menu and choose **DocumentDB**.

1. Choose **Add connection**.

1. In the dialog box, specify your Amazon DocumentDB credentials:

   1. Enter a **Connection name**. This is a name used by Canvas to identify this connection.

   1. For **Cluster**, select the cluster in Amazon DocumentDB that stores your data. Canvas automatically populates the dropdown menu with Amazon DocumentDB clusters in the same VPC as your Canvas application.

   1. Enter the **Username** for your Amazon DocumentDB cluster.

   1. Enter the **Password** for your Amazon DocumentDB cluster.

   1. Enter the name of the **Database** to which you want to connect.

   1. The **Read preference** option determines which types of instances on your cluster Canvas reads the data from. Select one of the following:
      + **Secondary preferred** – Canvas defaults to reading from the cluster’s secondary instances, but if a secondary instance isn’t available, then Canvas reads from a primary instance.
      + **Secondary** – Canvas only reads from the cluster’s secondary instances, which prevents the read operations from interfering with the cluster’s regular read and write operations.

   1. Choose **Add connection**. The following image shows the dialog box with the preceding fields for an Amazon DocumentDB connection.  
![\[Screenshot of the Add a new DocumentDB connection dialog box in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/add-docdb-connection.png)

You should now have an Amazon DocumentDB connection, and you can use your Amazon DocumentDB data in Canvas to create either a dataset or a data flow.

### Connect to an Amazon Redshift database
<a name="canvas-connecting-redshift"></a>

You can import data from Amazon Redshift, a data warehouse where your organization keeps its data. Before you can import data from Amazon Redshift, the AWS IAM role you use must have the `AmazonRedshiftFullAccess` managed policy attached. For instructions on how to attach this policy, see [Grant Users Permissions to Import Amazon Redshift Data](canvas-redshift-permissions.md). 

To import data from Amazon Redshift, you do the following:

1. Create a connection to an Amazon Redshift database.

1. Choose the data that you're importing.

1. Import the data.

You can use the Amazon Redshift editor to drag datasets onto the import pane and import them into SageMaker Canvas. For more control over the values returned in the dataset, you can use the following:
+ SQL queries
+ Joins

With SQL queries, you can customize how you import the values in the dataset. For example, you can specify the columns returned in the dataset or the range of values for a column.

You can use joins to combine multiple datasets from Amazon Redshift into a single dataset. You can drag your datasets from Amazon Redshift into the panel that gives you the ability to join the datasets.

You can use the SQL editor to edit the dataset that you've joined and convert the joined dataset into a single node. You can join another dataset to the node. You can import the data that you've selected into SageMaker Canvas.

Use the following procedure to import data from Amazon Redshift.

1. In the SageMaker Canvas application, go to the **Datasets** page.

1. Choose **Import data**, and from the dropdown menu, choose **Tabular**.

1. Enter a name for the dataset and choose **Create**.

1. For **Data Source**, open the dropdown menu and choose **Redshift**.

1. Choose **Add connection**.

1. In the dialog box, specify your Amazon Redshift credentials:

   1. For **Authentication method**, choose **IAM**.

   1. Enter the **Cluster identifier** to specify to which cluster you want to connect. Enter only the cluster identifier and not the full endpoint of the Amazon Redshift cluster.

   1. Enter the **Database name** of the database to which you want to connect.

   1. Enter a **Database user** to identify the user you want to use to connect to the database.

   1. For **ARN**, enter the IAM role ARN of the role that the Amazon Redshift cluster should assume to move and write data to Amazon S3. For more information about this role, see [ Authorizing Amazon Redshift to access other AWS services on your behalf](https://docs.aws.amazon.com/redshift/latest/mgmt/authorizing-redshift-service.html) in the *Amazon Redshift Management Guide*.

   1. Enter a **Connection name**. This is a name used by Canvas to identify this connection.

1. From the tab that has the name of your connection, drag the .csv file that you're importing to the **Drag and drop table to import** pane.

1. Optional: Drag additional tables to the import pane. You can use the GUI to join the tables. For more specificity in your joins, choose **Edit in SQL**.

1. Optional: If you're using SQL to query the data, you can choose **Context** to add context to the connection by specifying values for the following:
   + **Warehouse**
   + **Database**
   + **Schema**

1. Choose **Import data**.

The following image shows an example of fields specified for an Amazon Redshift connection.

![\[Screenshot of the Add a new Redshift connection dialog box in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-redshift-add-connection.png)


The following image shows the page used to join datasets in Amazon Redshift.

![\[Screenshot of the Import page in Canvas, showing two datasets being joined.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-redshift-join.png)


The following image shows an SQL query being used to edit a join in Amazon Redshift.

![\[Screenshot of a SQL query in the Edit SQL editor on the Import page in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-redshift-edit-sql.png)


## Connect to your data with JDBC connectors
<a name="canvas-connecting-jdbc"></a>

With JDBC, you can connect to your databases from sources such as Databricks, SQLServer, MySQL, PostgreSQL, MariaDB, Amazon RDS, and Amazon Aurora.

You must make sure that you have the necessary credentials and permissions to create the connection from Canvas.
+ For Databricks, you must provide a JDBC URL. The URL formatting can vary between Databricks instances. For information about finding the URL and the specifying the parameters within it, see [JDBC configuration and connection parameters](https://docs.databricks.com/integrations/bi/jdbc-odbc-bi.html#jdbc-configuration-and-connection-parameters) in the Databricks documentation. The following is an example of how a URL can be formatted: `jdbc:spark://aws-sagemaker-datawrangler.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/3122619508517275/0909-200301-cut318;AuthMech=3;UID=token;PWD=personal-access-token`
+ For other database sources, you must set up username and password authentication, and then specify those credentials when connecting to the database from Canvas. 

Additionally, your data source must either be accessible through the public internet, or if your Canvas application is running in **VPC only** mode, then the data source must run in the same VPC. For more information about configuring an Amazon RDS database in a VPC, see [Amazon VPC VPCs and Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_VPC.html) in the *Amazon RDS User Guide*.

After you’ve configured your data source credentials, you can sign in to the Canvas application and create a connection to the data source. Specify your credentials (or, for Databricks, the URL) when creating the connection.

## Connect to data sources with OAuth
<a name="canvas-connecting-oauth"></a>

Canvas supports using OAuth as an authentication method for connecting to your data in Snowflake and Salesforce Data Cloud. [OAuth](https://oauth.net/2/) is a common authentication platform for granting access to resources without sharing passwords.

**Note**  
You can only establish one OAuth connection for each data source.

To authorize the connection, you must following the initial setup described in [Set up connections to data sources with OAuth](canvas-setting-up-oauth.md).

After setting up the OAuth credentials, you can do the following to add a Snowflake or Salesforce Data Cloud connection with OAuth:

1. Sign in to the Canvas application.

1. Create a tabular dataset. When prompted to upload data, choose Snowflake or Salesforce Data Cloud as your data source.

1. Create a new connection to your Snowflake or Salesforce Data Cloud data source. Specify OAuth as the authentication method and enter your connection details.

You should now be able to import data from your databases in Snowflake or Salesforce Data Cloud.

## Connect to a SaaS platform
<a name="canvas-connecting-saas"></a>

You can import data from Snowflake and over 40 other external SaaS platforms. For a full list of the connectors, see the table on [Data import](canvas-importing-data.md).

**Note**  
You can only import tabular data, such as data tables, from SaaS platforms.

### Use Snowflake with Canvas
<a name="canvas-using-snowflake"></a>

Snowflake is a data storage and analytics service, and you can import your data from Snowflake into SageMaker Canvas. For more information about Snowflake, see the [Snowflake documentation](https://www.snowflake.com/en/).

You can import data from your Snowflake account by doing the following:

1. Create a connection to the Snowflake database.

1. Choose the data that you're importing by dragging and dropping the table from the left navigation menu into the editor.

1. Import the data.

You can use the Snowflake editor to drag datasets onto the import pane and import them into SageMaker Canvas. For more control over the values returned in the dataset, you can use the following:
+ SQL queries
+ Joins

With SQL queries, you can customize how you import the values in the dataset. For example, you can specify the columns returned in the dataset or the range of values for a column.

You can join multiple Snowflake datasets into a single dataset before you import into Canvas using SQL or the Canvas interface. You can drag your datasets from Snowflake into the panel that gives you the ability to join the datasets, or you can edit the joins in SQL and convert the SQL into a single node. You can join other nodes to the node that you've converted. You can then combine the datasets that you've joined into a single node and join the nodes to a different Snowflake dataset. Finally, you can import the data that you've selected into Canvas.

Use the following procedure to import data from Snowflake to Amazon SageMaker Canvas.

1. In the SageMaker Canvas application, go to the **Datasets** page.

1. Choose **Import data**, and from the dropdown menu, choose **Tabular**.

1. Enter a name for the dataset and choose **Create**.

1. For **Data Source**, open the dropdown menu and choose **Snowflake**.

1. Choose **Add connection**.

1. In the **Add a new Snowflake connection** dialog box, specify your Snowflake credentials. For the **Authentication method**, choose one of the following:
   + **Basic - username password** – Provide your Snowflake account ID, username, and password.
   + **ARN** – For improved protection of your Snowflake credentials, provide the ARN of an AWS Secrets Manager secret that contains your credentials. For more information, see [ Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html) in the *AWS Secrets Manager User Guide*.

     Your secret should have your Snowflake credentials stored in the following JSON format:

     ```
     {"accountid": "ID",
     "username": "username",
     "password": "password"}
     ```
   + **OAuth** – OAuth lets you authenticate without providing a password but requires additional setup. For more information about setting up OAuth credentials for Snowflake, see [Set up connections to data sources with OAuth](canvas-setting-up-oauth.md).

1. Choose **Add connection**.

1. From the tab that has the name of your connection, drag the .csv file that you're importing to the **Drag and drop table to import** pane.

1. Optional: Drag additional tables to the import pane. You can use the user interface to join the tables. For more specificity in your joins, choose **Edit in SQL**.

1. Optional: If you're using SQL to query the data, you can choose **Context** to add context to the connection by specifying values for the following:
   + **Warehouse**
   + **Database**
   + **Schema**

   Adding context to a connection makes it easier to specify future queries.

1. Choose **Import data**.

The following image shows an example of fields specified for a Snowflake connection.

![\[Screenshot of the Add a new Snowflake connection dialog box in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-snowflake-connection.png)


The following image shows the page used to add context to a connection.

![\[Screenshot of the Import page in Canvas, showing the Context dialog box.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-connection-context.png)


The following image shows the page used to join datasets in Snowflake.

![\[Screenshot of the Import page in Canvas, showing datasets being joined.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-snowflake-join.png)


The following image shows a SQL query being used to edit a join in Snowflake.

![\[Screenshot of a SQL query in the Edit SQL editor on the Import page in Canvas.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/canvas/canvas-snowflake-edit-sql.png)


### Use SaaS connectors with Canvas
<a name="canvas-connecting-external-appflow"></a>

**Note**  
For SaaS platforms besides Snowflake, you can only have one connection per data source.

Before you can import data from a SaaS platform, your administrator must authenticate and create a connection to the data source. For more information about how administrators can create a connection with a SaaS platform, see [Managing Amazon AppFlow connections](https://docs.aws.amazon.com/appflow/latest/userguide/connections.html) in the *Amazon AppFlow User Guide*.

If you’re an administrator getting started with Amazon AppFlow for the first time, see [Getting started](https://docs.aws.amazon.com/appflow/latest/userguide/getting-started.html) in the *Amazon AppFlow User Guide*.

To import data from a SaaS platform, you can follow the standard [Import tabular data](canvas-import-dataset.md#canvas-import-dataset-tabular) procedure, which shows you how to import tabular datasets into Canvas.

# Sample datasets in Canvas
<a name="canvas-sample-datasets"></a>

SageMaker Canvas provides sample datasets addressing unique use cases so you can start building, training, and validating models quickly without writing any code. The use cases associated with these datasets highlight the capabilities of SageMaker Canvas, and you can leverage these datasets to get started with building models. You can find the sample datasets in the **Datasets** page of your SageMaker Canvas application.

The following datasets are the samples that SageMaker Canvas provides by default. These datasets cover use cases such as predicting house prices, loan defaults, and readmission for diabetic patients; forecasting sales; predicting machine failures to streamline predictive maintenance in manufacturing units; and generating supply chain predictions for transportation and logistics. The datasets are stored in the `sample_dataset` folder in the default Amazon S3 bucket that SageMaker AI creates for your account in a Region.
+ **canvas-sample-diabetic-readmission.csv:** This dataset contains historical data including over fifteen features with patient and hospital outcomes. You can use this dataset to predict whether high-risk diabetic patients are likely to get readmitted to the hospital within 30 days of discharge, after 30 days, or not at all. Use the **redadmitted** column as the target column, and use the 3\$1 category prediction model type with this dataset. To learn more about how to build a model with this dataset, see the [SageMaker Canvas workshop page](https://catalog.us-east-1.prod.workshops.aws/workshops/80ba0ea5-7cf9-4b8c-9d3f-1cd988b6c071/en-US/zzz-legacy/1-use-cases/5-hcls). This dataset was obtained from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/diabetes+130-us+hospitals+for+years+1999-2008). 
+ **canvas-sample-housing.csv:** This dataset contains data on the characteristics tied to a given housing price. You can use this dataset to predict housing prices. Use the **median\$1house\$1value** column as the target column, and use the numeric prediction model type with this dataset. To learn more about building a model with this dataset, see the [SageMaker Canvas workshop page](https://catalog.us-east-1.prod.workshops.aws/workshops/80ba0ea5-7cf9-4b8c-9d3f-1cd988b6c071/en-US/zzz-legacy/1-use-cases/2-real-estate). This is the California housing dataset obtained from the [StatLib repository](https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.html).
+ **canvas-sample-loans.csv:** This dataset contains complete loan data for all loans issued from 2007–2011, including the current loan status and latest payment information. You can use this dataset to predict whether a customer will repay a loan. Use the **loan\$1status** column as the target column, and use the 3\$1 category prediction model type with this dataset. To learn more about how to build a model with this dataset, see the [SageMaker Canvas workshop page](https://catalog.us-east-1.prod.workshops.aws/workshops/80ba0ea5-7cf9-4b8c-9d3f-1cd988b6c071/en-US/zzz-legacy/1-use-cases/4-finserv). This data uses the LendingClub data obtained from [Kaggle](https://www.kaggle.com/datasets/wordsforthewise/lending-club).
+ **canvas-sample-maintenance.csv:** This dataset contains data on the characteristics tied to a given maintenance failure type. You can use this dataset to predict which failure will occur in the future. Use the **Failure Type** column as the target column, and use the 3\$1 category prediction model type with this dataset. To learn more about how to build a model with this dataset, see the [SageMaker Canvas workshop page](https://catalog.us-east-1.prod.workshops.aws/workshops/80ba0ea5-7cf9-4b8c-9d3f-1cd988b6c071/en-US/zzz-legacy/1-use-cases/6-manufacturing). This dataset was obtained from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/AI4I+2020+Predictive+Maintenance+Dataset).
+ **canvas-sample-shipping-logs.csv:** This dataset contains complete shipping data for all products delivered, including estimated time shipping priority, carrier, and origin. You can use this dataset to predict the estimated time of arrival of the shipment in number of days. Use the **ActualShippingDays** column as the target column, and use the numeric prediction model type with this dataset. To learn more about how to build a model with this data, see the [SageMaker Canvas workshop page](https://catalog.us-east-1.prod.workshops.aws/workshops/80ba0ea5-7cf9-4b8c-9d3f-1cd988b6c071/en-US/zzz-legacy/1-use-cases/7-supply-chain). This is a synthetic dataset created by Amazon.
+ **canvas-sample-sales-forecasting.csv:** This dataset contains historical time series sales data for retail stores. You can use this dataset to forecast sales for a particular retail store. Use the **sales** column as the target column, and use the time series forecasting model type with this dataset. To learn more about how to build a model with this dataset, see the [SageMaker Canvas workshop page](https://catalog.us-east-1.prod.workshops.aws/workshops/80ba0ea5-7cf9-4b8c-9d3f-1cd988b6c071/en-US/zzz-legacy/1-use-cases/3-retail). This is a synthetic dataset created by Amazon.

# Re-import a deleted sample dataset
<a name="canvas-sample-datasets-reimport"></a>

Amazon SageMaker Canvas provides you with sample datasets for various use cases that highlight the capabilities of Canvas. To learn more about the sample datasets that are available, see [Sample datasets in Canvas](canvas-sample-datasets.md). If you no longer wish to use the sample datasets, you can delete them from the **Datasets** page of your SageMaker Canvas application. However, these datasets are still stored in the Amazon S3 bucket that you specified as the [Canvas storage location](https://docs.aws.amazon.com/sagemaker/latest/dg/canvas-storage-configuration.html), so you can always access them later. 

If you used the default Amazon S3 bucket, the bucket name follows the pattern `sagemaker-{region}-{account ID}`. You can find the sample datasets in the directory path `Canvas/sample_dataset`.

If you delete a sample dataset from your SageMaker Canvas application and want to access the sample dataset again, use the following procedure.

1. Navigate to the **Datasets** page in your SageMaker Canvas application.

1. Choose **Import data**.

1. From the list of Amazon S3 buckets, select the bucket that is your Canvas storage location. If using the default SageMaker AI-created Amazon S3 bucket, it follows the naming pattern `sagemaker-{region}-{account ID}`.

1. Select the **Canvas** folder.

1. Select the **sample\$1dataset** folder, which contains all of the sample datasets for SageMaker Canvas.

1. Select the dataset you want to import, and then choose **Import data**.