We are no longer updating the Amazon Machine Learning service or accepting new users for it. This documentation is available for existing users, but we are no longer updating it. For more information, see [ What is Amazon Machine Learning](https://docs.aws.amazon.com/machine-learning/latest/dg/what-is-amazon-machine-learning.html).

# What is Amazon Machine Learning?
<a name="what-is-amazon-machine-learning"></a>

We are no longer updating the Amazon Machine Learning (Amazon ML) service or accepting new users for it. This documentation is available for existing users, but we are no longer updating it. 

AWS now provides a robust, cloud-based service — Amazon SageMaker AI — so that developers of all skill levels can use machine learning technology. SageMaker AI is a fully managed machine learning service that helps you create powerful machine learning models. With SageMaker AI, data scientists and developers can build and train machine learning models, and then directly deploy them into a production-ready hosted environment. 

For more information, see the [SageMaker AI documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html).

**Topics**
+ [Amazon Machine Learning Key Concepts](amazon-machine-learning-key-concepts.md)
+ [Accessing Amazon Machine Learning](accessing-amazon-machine-learning.md)
+ [Regions and Endpoints](regions-and-endpoints.md)
+ [Pricing for Amazon ML](pricing.md)

# Amazon Machine Learning Key Concepts
<a name="amazon-machine-learning-key-concepts"></a>

 This section summarizes the following key concepts and describes in greater detail how they are used within Amazon ML: 
+  [Datasources](#datasources) contain metadata associated with data inputs to Amazon ML 
+  [ML Models](#ml-models) generate predictions using the patterns extracted from the input data 
+  [Evaluations](#evaluations) measure the quality of ML models 
+  [Batch Predictions](#batch-predictions) *asynchronously* generate predictions for multiple input data observations 
+  [Real-time Predictions](#real-time-predictions) *synchronously* generate predictions for individual data observations 

## Datasources
<a name="datasources"></a>

 A datasource is an object that contains metadata about your input data. Amazon ML reads your input data, computes descriptive statistics on its attributes, and stores the statistics—along with a schema and other information—as part of the datasource object. Next, Amazon ML uses the datasource to train and evaluate an ML model and generate batch predictions. 

**Important**  
 A datasource does not store a copy of your input data. Instead, it stores a reference to the Amazon S3 location where your input data resides. If you move or change the Amazon S3 file, Amazon ML cannot access or use it to create a ML model, generate evaluations, or generate predictions. 

 The following table defines terms that are related to datasources. 


|  **Term**  |  **Definition**  | 
| --- | --- | 
|  Attribute  |   A unique, named property within an observation. In tabular-formatted data such as spreadsheets or comma-separated values (CSV) files, the column headings represent the attributes, and the rows contain values for each attribute.   Synonyms: variable, variable name, field, column   | 
|  Datasource Name  |  (Optional) Allows you to define a human-readable name for a datasource. These names enable you to find and manage your datasources in the Amazon ML console.  | 
|  Input Data  |  Collective name for all the observations that are referred to by a datasource.  | 
|  Location  |  Location of input data. Currently, Amazon ML can use data that is stored within Amazon S3 buckets, Amazon Redshift databases, or MySQL databases in Amazon Relational Database Service (RDS).  | 
|  Observation  |   A single input data unit. For example, if you are creating an ML model to detect fraudulent transactions, your input data will consist of many observations, each representing an individual transaction.   Synonyms: record, example, instance, row   | 
|  Row ID  |   (Optional) A flag that, if specified, identifies an attribute in the input data to be included in the prediction output. This attribute makes it easier to associate which prediction corresponds with which observation.   Synonyms: row identifier   | 
|  Schema  |  The information needed to interpret the input data, including attribute names and their assigned data types, and names of special attributes.  | 
|  Statistics  |   Summary statistics for each attribute in the input data. These statistics serve two purposes:   The Amazon ML console displays them in graphs to help you understand your data at-a-glance and identify irregularities or errors.   Amazon ML uses them during the training process to improve the quality of the resulting ML model.   | 
|  Status  |  Indicates the current state of the datasource, such as In Progress, Completed, or Failed.  | 
|  Target Attribute  |   In the context of training an ML model, the target attribute identifies the name of the attribute in the input data that contains the "correct" answers. Amazon ML uses this to discover patterns in the input data and generate an ML model. In the context of evaluating and generating predictions, the target attribute is the attribute whose value will be predicted by a trained ML model.   Synonyms: target   | 

## ML Models
<a name="ml-models"></a>

 An ML model is a mathematical model that generates predictions by finding patterns in your data. Amazon ML supports three types of ML models: binary classification, multiclass classification and regression. 

 The following table defines terms that are related to ML models. 


|  **Term**  |  **Definition**  | 
| --- | --- | 
|  Regression  |  The goal of training a regression ML model is to predict a numeric value.  | 
|  Multiclass  |  The goal of training a multiclass ML model is to predict values that belong to a limited, pre-defined set of permissible values.  | 
|  Binary  |  The goal of training a binary ML model is to predict values that can only have one of two states, such as true or false.  | 
|  Model Size  |  ML models capture and store patterns. The more patterns a ML model stores, the bigger it will be. ML model size is described in Mbytes.  | 
|  Number of Passes  |  When you train an ML model, you use data from a datasource. It is sometimes beneficial to use each data record in the learning process more than once. The number of times that you let Amazon ML use the same data records is called the number of passes.  | 
|  Regularization  |  Regularization is a machine learning technique that you can use to obtain higher-quality models. Amazon ML offers a default setting that works well for most cases.  | 

## Evaluations
<a name="evaluations"></a>

 An evaluation measures the quality of your ML model and determines if it is performing well. 

 The following table defines terms that are related to evaluations. 


|  **Term**  |  **Definition**  | 
| --- | --- | 
|  Model Insights  |  Amazon ML provides you with a metric and a number of insights that you can use to evaluate the predictive performance of your model.  | 
|  AUC  |  Area Under the ROC Curve (AUC) measures the ability of a binary ML model to predict a higher score for positive examples as compared to negative examples.  | 
|  Macro-averaged F1-score  |  The macro-averaged F1-score is used to evaluate the predictive performance of multiclass ML models.  | 
|  RMSE  |  The Root Mean Square Error (RMSE) is a metric used to evaluate the predictive performance of regression ML models.  | 
|  Cut-off  |  ML models work by generating numeric prediction scores. By applying a cut-off value, the system converts these scores into 0 and 1 labels.  | 
|  Accuracy  |  Accuracy measures the percentage of correct predictions.  | 
|  Precision  |  Precision shows the percentage of actual positive instances (as opposed to false positives) among those instances that have been retrieved (those predicted to be positive). In other words, how many selected items are positive?  | 
|  Recall  |  Recall shows the percentage of actual positives among the total number of relevant instances (actual positives). In other words, how many positive items are selected?  | 

## Batch Predictions
<a name="batch-predictions"></a>

 Batch predictions are for a set of observations that can run all at once. This is ideal for predictive analyses that do not have a real-time requirement. 

 The following table defines terms that are related to batch predictions. 


|  **Term**  |  **Definition**  | 
| --- | --- | 
|  Output Location  |  The results of a batch prediction are stored in an S3 bucket output location.  | 
|  Manifest File  |  This file relates each input data file with its associated batch prediction results. It is stored in the S3 bucket output location.  | 

## Real-time Predictions
<a name="real-time-predictions"></a>

 Real-time predictions are for applications with a low latency requirement, such as interactive web, mobile, or desktop applications. Any ML model can be queried for predictions by using the low latency real-time prediction API. 

 The following table defines terms that are related to real-time predictions. 


|  **Term**  |  **Definition**  | 
| --- | --- | 
|  Real-time Prediction API  |  The Real-time Prediction API accepts a single input observation in the request payload and returns the prediction in the response.  | 
|  Real-time Prediction Endpoint  |  To use an ML model with the real-time prediction API, you need to create a real-time prediction endpoint. Once created, the endpoint contains the URL that you can use to request real-time predictions.  | 

# Accessing Amazon Machine Learning
<a name="accessing-amazon-machine-learning"></a>

You can access Amazon ML by using any of the following:

**Amazon ML console**  
 You can access the Amazon ML console by signing into the AWS Management Console, and opening the Amazon ML console at [https://console.aws.amazon.com/machinelearning/](https://console.aws.amazon.com/machinelearning/). 

**AWS CLI**  
 For information about how to install and configure the AWS CLI, see Getting Set Up with the AWS Command Line Interface in the  [AWS Command Line Interface User Guide](https://docs.aws.amazon.com/cli/latest/userguide/). 

**Amazon ML API**  
 For more information about the Amazon ML API, see [Amazon ML API Reference](http://docs.aws.amazon.com/machine-learning/latest/APIReference/API_Operations.html). 

**AWS SDKs**  
 For more information about the AWS SDKs, see [Tools for Amazon Web Services.](https://aws.amazon.com/tools/) 

# Regions and Endpoints
<a name="regions-and-endpoints"></a>

Amazon Machine Learning (Amazon ML) supports real-time prediction endpoints in the following two regions: 


| Region name | Region | Endpoint | Protocol | 
| --- | --- | --- | --- | 
| US East (N. Virginia) | us-east-1 |  machinelearning.us-east-1.amazonaws.com  | HTTPS | 
| Europe (Ireland) | eu-west-1 | machinelearning.eu-west-1.amazonaws.com | HTTPS | 

You can host data sets, train and evaluate models, and trigger predictions in any region.

We recommend that you keep all of your resources in the same region. If your input data is in a different region than your Amazon ML resources, you accrue cross regional data transfer fees. You can call a real-time prediction endpoint from any region, but calling an endpoint from a region that does not have the endpoint that you're calling can impact real-time prediction latencies.

# Pricing for Amazon ML
<a name="pricing"></a>

With AWS services, you pay only for what you use. There are no minimum fees and no upfront commitments.

Amazon Machine Learning (Amazon ML) charges an hourly rate for the compute time used to compute data statistics and train and evaluate models, and then you pay for the number of predictions generated for your application. For real-time predictions, you also pay an hourly reserved capacity charge based on the size of your model.

Amazon ML estimates the costs for predictions only in the [Amazon ML console](https://console.aws.amazon.com/machinelearning/).

For more information about Amazon ML pricing, see [https://aws.amazon.com/machine-learning/pricing/](https://aws.amazon.com/machine-learning/pricing/).

**Topics**
+ [Estimating Batch Prediction Cost](#w2aab7c20c14)
+ [Estimating Real-Time Prediction Cost](#w2aab7c20c16)

## Estimating Batch Prediction Cost
<a name="w2aab7c20c14"></a>

When you request batch predictions from an Amazon ML model using the Create Batch Prediction wizard, Amazon ML estimates the cost of these predictions. The method to compute the estimate varies based on the type of data that is available.

### Estimating Batch Prediction Cost When Data Statistics Are Available
<a name="w2aab7c20c14b4"></a>

The most accurate cost estimate is obtained when Amazon ML has already computed summary statistics on the datasource used to request predictions. These statistics are always computed for datasources that have been created using the Amazon ML console. API users must set the `ComputeStatistics` flag to `True` when creating datasources programmatically using the [CreateDataSourceFromS3](http://docs.aws.amazon.com/machine-learning/latest/APIReference/API_CreateDataSourceFromS3.html), [CreateDataSourceFromRedshift](http://docs.aws.amazon.com/machine-learning/latest/APIReference/API_CreateDataSourceFromRedshift.html), or the [CreateDataSourceFromRDS](http://docs.aws.amazon.com/machine-learning/latest/APIReference/API_CreateDataSourceFromRDS.html) APIs. The datasource must be in the `READY` state for the statistics to be available.

One of the statistics that Amazon ML computes is the number of data records. When the number of data records is available, the Amazon ML Create Batch Prediction wizard estimates the number of predictions by multiplying the number of data records by the [fee for batch predictions](https://aws.amazon.com/machine-learning/pricing/).

Your actual cost may vary from this estimate for the following reasons:
+ Some of the data records might fail processing. You are not billed for predictions from failed data records.
+ The estimate doesn't take into account pre-existing credits or other adjustments that are applied by AWS.

 ![\[Batch prediction results page showing estimated cost, ML fee, and S3 destination input.\]](http://docs.aws.amazon.com/machine-learning/latest/dg/images/image59b.png) 

### Estimating Batch Prediction Cost When Only Data Size Is Available
<a name="w2aab7c20c14b6"></a>

When you request a batch prediction and the data statistics for the request datasource are not available, Amazon ML estimates the cost based on the following:
+ The total data size that is computed and persisted during datasource validation
+ The average data record size, which Amazon ML estimates by reading and parsing the first 100 MB of your data file

To estimate the cost of your batch prediction, Amazon ML divides the total data size by the average data record size. This method of cost prediction is less precise than the method used when the number of data records is available because the first records of your data file might not accurately represent the average record size.

### Estimating Batch Prediction Cost When Neither Data Statistics nor Data Size Are Available
<a name="w2aab7c20c14b8"></a>

When neither data statistics nor the data size are available, Amazon ML cannot estimate the cost of your batch predictions. This is commonly the case when the data source you are using to request batch predictions has not yet been validated by Amazon ML. This can happen when you have created a datasource that is based on an Amazon Redshift (Amazon Redshift) or Amazon Relational Database Service (Amazon RDS) query, and the data transfer has not yet completed, or when datasource creation is queued up behind other operations in your account. In this case, the Amazon ML console informs you about the fees for batch prediction. You can choose to proceed with the batch prediction request without an estimate, or to cancel the wizard and return after the datasource used for predictions is in the INPROGRESS or READY state.

## Estimating Real-Time Prediction Cost
<a name="w2aab7c20c16"></a>

When you create a real-time prediction endpoint using the Amazon ML console, you will be shown the estimated reserve capacity charge, which is an ongoing charge for reserving the endpoint for prediction processing. This charge varies based on the size of the model, as explained on the [service pricing page](https://aws.amazon.com/machine-learning/pricing/). You will also be informed about the standard Amazon ML real-time prediction charge. 

 ![\[Dialog box for creating a real-time endpoint with model details and pricing information.\]](http://docs.aws.amazon.com/machine-learning/latest/dg/images/image60b.png)