

Amazon Fraud Detector is no longer open to new customers as of November 7, 2025. For capabilities similar to Amazon Fraud Detector, explore Amazon SageMaker, AutoGluon, and AWS WAF.

# Model


Amazon Fraud Detector uses machine learning models for generating fraud predictions. Each model is trained using a *model type*. The model type specifies the algorithms and transformations used for training the model. Model training is the process of using a dataset that you provide to create a model that can predict fraudulent events.

To create a model, you must first choose a model type and then prepare and provide data that will be used to train the model. 

# Choose a model type


The following model types are available in Amazon Fraud Detector. Choose a model type that works for your use case. 
+ **Online Fraud Insights**

  The *Online Fraud Insights* model type is optimized to detect fraud when little historical data is available about the entity being evaluated, for example, a new customer registering online for a new account.
+ **Transaction Fraud Insights**

  The *Transaction Fraud Insights* model type is best suited for detecting fraud use cases where the entity that is being evaluated might have a history of interactions that the model can analyze to improve prediction accuracy (for example, an existing customer with history of past purchases).
+ **Account Takeover Insights**

  The *Account Takeover Insights* model type detects if an account was compromised by phishing or another type of attack. The login data of a compromised account, such as the browser and device used at login, is different from the historical login data that’s associated with the account. 

# Online fraud insights


Online Fraud Insights is a supervised machine learning model, which means that it uses historical examples of fraudulent and legitimate transactions to train the model. The Online Fraud Insights model can detect fraud based on little historical data. The model’s inputs are flexible, so you can adapt it to detect a variety of fraud risks including fake reviews, promotion abuse, and guest checkout fraud. 

The Online Fraud Insights model uses an ensemble of machine learning algorithms for data enrichment, transformation, and fraud classification. As part of the model training process, Online Fraud Insights enriches raw data elements like IP address and BIN number with third-party data such as the geolocation of the IP address or the issuing bank for a credit card. In addition to third-party data, Online Fraud Insights uses deep learning algorithms that take into account fraud patterns that have been seen at Amazon and AWS. These fraud patterns become input features to your model using a gradient tree boosting algorithm.

To increase performance, Online Fraud Insights optimizes the hyper parameters of the gradient tree boosting algorithm via a Bayesian optimization process. It sequentially trains dozens of different models with varying model parameters (such as number of trees, depth of trees, and number of samples per leaf). It also uses different optimization strategies like upweighting the minority fraud population to take care of very low fraud rates.

## Selecting data source


When training an Online Fraud Insights model, you can choose to train the model on event data that is either stored externally (outside of Amazon Fraud Detector) or stored within Amazon Fraud Detector. The external storage Amazon Fraud Detector currently supports is Amazon Simple Storage Service (Amazon S3). If your are using external storage, your event dataset must be uploaded as a comma-separated values (CSV) format to an Amazon S3 bucket. These data storage options are referred to within the model training configuration as EXTERNAL\$1EVENTS (for external storage) and INGESTED\$1EVENTS (for internal storage). For more information about the available data sources and how to store data in them, see [Event data storage](event-data-storage.md).

## Preparing data


Regardless of where you choose to store your event data (Amazon S3 or Amazon Fraud Detector), the requirements for Online Fraud Insights model type are the same.

Your dataset must contain the column header EVENT\$1LABEL. This variable classifies an event as fraudulent or legitimate. When using a CSV file (external storage), you must include EVENT\$1LABEL for each event in the file. For internal storage, the EVENT\$1LABEL field is optional but all events must be labeled to be included within a training dataset. When configuring your model training, you can choose whether to ignore unlabeled events, assume a legitimate label for unlabeled events, or assume a fraudulent label for all unlabeled events. 

## Selecting data


See [Gather event data](https://docs.aws.amazon.com//frauddetector/latest/ug/create-event-dataset.html#gather-event-data) for information on selecting data for training your Online Fraud Insights model.

The Online Fraud Insights training process samples and partitions historic data based on EVENT\$1TIMESTAMP. There is no need to manually sample the data and doing so may negatively impact your model results.

## Event variables


The Online Fraud Insights model requires at least two variables, apart from the required event metadata, that has passed [data validation](https://docs.aws.amazon.com//frauddetector/latest/ug/create-event-dataset.html#dataset-validation) for model training and allows up to 100 variables per model. Generally, the more variables you provide, the better the model can differentiate between fraud and legitimate events. While the Online Fraud Insights model can support dozens of variables, including custom variables, we recommend including IP address and email address because these variables are typically most effective at identifying the entity being evaluated. 

## Validating data


As part of the training process, Online Fraud Insights will validate the dataset for data quality issues that may impact model training. After validating the data, Amazon Fraud Detector will take appropriate action to build the best possible model. This includes issuing warnings for potential data quality issues, automatically removing variables that have data quality issues, or issuing an error and stopping the model training process. For more information, see [dataset validation](https://docs.aws.amazon.com//frauddetector/latest/ug/create-event-dataset.html#dataset-validation). 

# Transaction fraud insights


The Transaction Fraud Insights model type is designed to detect online, or card-not-present, transaction fraud. Transaction Fraud Insights is a supervised machine learning model, which means that it uses historical examples of fraudulent and legitimate transactions to train the model.

The Transaction Fraud Insights model uses an ensemble of machine learning algorithms for data enrichment, transformation, and fraud classification. It leverages a feature engineering engine to create entity-level and event-level aggregates. As part of the model training process, Transaction Fraud Insights enriches raw data elements like IP address and BIN number with third-party data such as the geolocation of the IP address or the issuing bank for a credit card. In addition to third-party data, Transaction Fraud Insights uses deep learning algorithms that take into account fraud patterns that have been seen at Amazon and AWS These fraud patterns become input features to your model using a gradient tree boosting algorithm.

To increase performance, Transaction Fraud Insights optimizes the hyper parameters of the gradient tree boosting algorithm via a Bayesian optimization process, sequentially training dozens of different models with varying model parameters (such as number of trees, depth of trees, number of samples per leaf) as well as different optimization strategies like upweighting the minority fraud population to take care of very low fraud rates.

As part of the model training process, the Transaction Fraud model’s feature engineering engine calculates values for each unique entity within your training dataset to help improve fraud predictions. For example, during the training process, Amazon Fraud Detector computes and stores the last time an entity made a purchase and dynamically updates this value each time you call the `GetEventPrediction` or `SendEvent` API. During a fraud prediction, the event variables are combined with other entity and event metadata to predict whether the transaction is fraudulent.

## Selecting data source


Transaction Fraud Insights models are trained on dataset stored internally with Amazon Fraud Detector (INGESTED\$1EVENTS) only. This allows Amazon Fraud Detector to continuously update calculated values about the entities you are evaluating. For more information about the available data sources, see [Event data storage](event-data-storage.md)

## Preparing data


Before you train a Transaction Fraud Insights model, ensure that your data file contains all headers as mentioned in [Prepare event dataset](https://docs.aws.amazon.com//frauddetector/latest/ug/create-event-dataset.html#prepare-event-dataset). The Transaction Fraud Insights model compares new entities that are received with the examples of fraudulent and legitimate entities in the dataset, so it is helpful to provide many examples for each entity. 

Amazon Fraud Detector automatically transforms the stored event dataset into the correct format for training. After the model has completed training, you can review the performance metrics and determine whether you should add entities to your training dataset. 

## Selecting data


By default, Transaction Fraud Insights trains on your entire stored dataset for the Event Type that you select. You can optionally set a time range to reduce the events that are used to train your model. When setting a time range, ensure that the records that are used to train the model have had sufficient time to mature. That is, enough time has passed to ensure legitimate and fraud records have been correctly identified. For example, for chargeback fraud, it often takes 60 days or more to correctly identify fraudulent events. For the best model performance, ensure that all records in your training dataset are mature. 

There is no need to select a time range that represents an ideal fraud rate. Amazon Fraud Detector automatically samples your data to achieve balance between fraud rates, time range, and entity counts. 

Amazon Fraud Detector returns a validation error during model training if you select a time range for which there are not enough events to successfully train a model. For stored datasets, the EVENT\$1LABEL field is optional, but events must be labeled to be included in your training dataset. When configuring your model training, you can choose whether to ignore unlabeled events, assume a legitimate label for unlabeled events, or assume a fraudulent label for unlabeled events. 

## Event variables


The event type used to train the model must contain at least 2 variables, apart from required event metadata, that has passed [data validation](https://docs.aws.amazon.com//frauddetector/latest/ug/create-event-dataset.html#dataset-validation) and can contain up to 100 variables. Generally, the more variables you provide, the better the model can differentiate between fraud and legitimate events. Although the Transaction Fraud Insight model can support dozens of variables, including custom variables, we recommend that you include IP address, email address, payment instrument type, order price, and card BIN. 

## Validating data


As part of the training process, Transaction Fraud Insights validates the training dataset for data quality issues that might impact model training. After validating the data, Amazon Fraud Detector takes appropriate action to build the best possible model. This includes issuing warnings for potential data quality issues, automatically removing variables that have data quality issues, or issuing an error and stopping the model training process. For more information, see [Dataset validation](https://docs.aws.amazon.com//frauddetector/latest/ug/create-event-dataset.html#dataset-validation). 

Amazon Fraud Detector will issue a warning but continue training a model if the number of unique entities is less than 1,500 because this can impact the quality of the training data. If you receive a warning, review the [performance metric](training-performance-metrics.md).

# Account takeover insights


The Account Takeover Insights (ATI) model type identifies fraudulent online activity by detecting if accounts were compromised through malicious takeovers, phishing, or from credentials being stolen. Account Takeover Insights is a machine learning model that uses login events from your online business to train the model. 

You can embed a trained Account Takeover Insights model within your real time login flow to detect if an account is compromised. The model assesses a variety of authentication and login types. They include web application logins, API-based authentications, and single-sign-on (SSO). To use the Account Takeover Insights model, call the [GetEventPrediction](https://docs.aws.amazon.com/frauddetector/latest/api/API_GetEventPrediction.html) API after a valid login credentials is presented. The API generates a score that quantifies the risk of the account being compromised. Amazon Fraud Detector uses the score and the rules that you defined to return one or more outcomes for the login events. The outcomes are ones that you configured. Based on the outcomes you receive, you can take appropriate actions for each login. That is, you can either approve or challenge the credentials presented for the login. For example, you can challenge the credentials by asking for an account PIN as an additional verification.

You can also use the Account Takeover Insights model to evaluate account logins asynchronously and take actions on high-risk accounts. For example, a high-risk account can be added to investigation queue for a human reviewer to determine if further action needs to be taken, such as suspend the account.

The Account Takeover Insights model is trained using a dataset that contains the historical login events of your business. You provide this data. You can optionally label the accounts as legitimate or fraudulent. However, this isn’t required to train the model. The Account Takeover Insights model detects anomalies based on the history of successful logins of an account. It also learns how to detect anomalies in a user’s behavior that suggest increased risk of an event of malicious account takeover. For example, a user that typically logs in from the same set of devices and IP addresses. A fraudster typically logs in from a different device and geolocation. This technique produces a risk score of an activity being anomalous, which typically is a primary characteristic of malicious account takeovers.

Before training an Account Takeover Insights model, Amazon Fraud Detector uses a combination of machine learning techniques to perform data enrichment, data aggregation, and data transformation . Then, during the training process, Amazon Fraud Detector enriches the raw data elements that you provide. Examples of raw data elements include IP address and user agent. Amazon Fraud Detector uses these elements to create additional inputs that describe the login data. These inputs include the device, browser, and geolocation inputs. Amazon Fraud Detector also uses the login data that you provide to continuously compute aggregated variables that describe the past user behavior. Examples of user behavior include the number of times that the user signed in from a specific IP address. Using these additional enrichments and aggregates, Amazon Fraud Detector can generate strong model performance from a small set of inputs from your login events.

The Account Takeover Insights model detects instances where a legitimate account is accessed by a bad actor, regardless of whether the bad actor is human or a robot. The model produces a single score that indicates the relative risk of account compromise. Accounts that might have been compromised are flagged as high-risk accounts. You can process high-risk accounts by one of two ways. Either, you can enforce an additional identity verification. Or, you can send the account to a queue for manual investigation. 

## Selecting data source


Account Takeover Insights models are trained on a dataset that’s stored internally, in Amazon Fraud Detector. To store your login events data with Amazon Fraud Detector, create a CSV file with login events of users. For each event, include login data such as the event timestamp, user ID, IP address, user agent, and whether the login data is valid. After creating the CSV file, first upload the file to Amazon Fraud Detector, and then use import feature to store the data. You can then train your model using the stored data. For more information on storing your event dataset with Amazon Fraud Detector see [Store your event data internally with Amazon Fraud Detector](storing-event-data-afd.md)

## Preparing data


Amazon Fraud Detector requires that you provide your user account login data in a comma-separated values (CSV) file that’s encoded in the UTF-8 format. The first line of your CSV file must contain a file header. The file header consists of event metadata and event variables that describe each data element. Event data follows the header. Each line in the event data consists of data from a single login event.

For the Accounts Takeover Insights model, you must provide the following event metadata and event variables in the header line of your CSV file. 

** Event metadata**

We recommend that you provide the following metadata in your CSV file header. The event metadata must be in uppercase letters.
+ EVENT\$1ID - A unique identifier for the login event.
+ ENTITY\$1TYPE - The entity that performs the login event, such as a merchant or a customer.
+ ENTITY\$1ID - An identifier for the entity performing the login event. 
+ EVENT\$1TIMESTAMP - The timestamp when the login event occurred. The timestamp must be in ISO 8601 standard in UTC.
+ EVENT\$1LABEL (recommended) - A label that classifies the event as fraudulent or legitimate. You can use any labels, such as "fraud", "legit", "1", or "0".

**Note**  
Event metadata must be in uppercase letters. It’s case sensitive.
Labels aren’t required for login events. However, we recommend that you include EVENT\$1LABEL metadata and provide labels for your login events. It’s fine if the labels are incomplete or sporadic. If you provide labels, Amazon Fraud Detector will use them to automatically calculate an Account Takeover Discovery Rate and display it in model performance chart and table.

** Event variables**

For Accounts Takeover Insights model, there are both required (mandatory) variables that you must provide and optional variables. When you create your variables, make sure to assign the variable to the right variable type. As part of the model training process, Amazon Fraud Detector uses the variable type that’s associated with the variable to perform variable enrichment and feature engineering.

**Note**  
Event variable names must be in lowercase letters. They’re case sensitive.

**Mandatory variables**

The following variables are required for training an Accounts Takeover Insights model.


| Category | Variable type | Description | 
| --- | --- | --- | 
| IP address | IP\$1ADDRESS | The IP address used in the login event | 
| Browser and device | USERAGENT | The browser, device, and OS used in the login event | 
| Valid credentials | VALIDCRED | Indicates if the credentials that were used for login are valid | 

**Optional variables**

The following variables are optional for training an Accounts Takeover Insights model.


| Category | Type | Description | 
| --- | --- | --- | 
| Browser and device | FINGERPRINT | The unique identifier for a browser or device fingerprint | 
| Session Id | SESSION\$1ID | The identifier for an authentication session | 
| Label | EVENT\$1LABEL | A label that classifies the event as fraudulent or legitimate. You can use any labels, such as "fraud", "legit", "1", or "0". | 
| Timestamp | LABEL\$1TIMESTAMP | The timestamp when the label was last updated. This is required if EVENT\$1LABEL is provided. | 

**Note**  
You can provide any variable names for both mandatory variables optional variables. It’s important that each mandatory and optional variable is assigned to the right variable type.
You can provide additional variables. However, Amazon Fraud Detector won’t include these variables for training an Accounts Takeover Insights model. 

## Selecting data


Gathering data is an important step to creating your Account Takeover Insights model. As you start to gather your login data, consider the following requirements and recommendations:

**Required**
+ Provide at least 1,500 user account examples, each with at least two associated login events.
+ Your dataset must cover at least 30 days of login events. You can later specify the specific time range of the events to use to train the model.

**Recommended**
+ Your dataset includes examples of unsuccessful login events. You can optionally label these unsuccessful logins as “fraudulent” or “legitimate.”
+ Prepare historic data with login events spanning more than six months and include 100K entities.

If you don’t have a dataset that already meets the minimum requirements, consider streaming event data to Amazon Fraud Detector by calling the [SendEvent](https://docs.aws.amazon.com/frauddetector/latest/api/API_SendEvent.html) API operation.

## Validating data


Before creating your Account Takeover Insights model, Amazon Fraud Detector checks if the metadata and variables you included in your dataset for training the model meet size and format requirements. For more information, see [Dataset validation](create-event-dataset.md#dataset-validation). It also checks for other requirements. If the dataset doesn’t pass validation, model isn’t created. For the model to be successfully created, make sure to fix the data that didn’t pass the validation before you train again.

**Common dataset errors**

When validating a dataset for training an Account Takeover Insights model, Amazon Fraud Detector scans for these and other issues and throws an error if it encounters one or more of the issues.
+ CSV file isn’t in the UTF-8 format.
+ The CSV file header doesn’t contain at least one of the following metadata: `EVENT_ID`, `ENTITY_ID`, or `EVENT_TIMESTAMP`.
+ The CSV file header doesn’t contain at least one variable of the following variable types: `IP_ADDRESS`, `USERAGENT`, or `VALIDCRED`. 
+ There’s more than one variable that’s associated with the same variable type. 
+ More than 0.1% of values in `EVENT_TIMESTAMP` contains nulls or values other than the supported date and timestamp formats.
+ The number of days between the first and last event is fewer than 30 days.
+ More than 10% of variables of the `IP_ADDRESS` variable type are either invalid or null.
+ More than 50% of variables of the `USERAGENT` variable type contain nulls.
+ All of the variables of the `VALIDCRED` variable type are set to `false`.

# Build a model


Amazon Fraud Detector models learn to detect fraud for a specific event type. In Amazon Fraud Detector, you first create a model, which acts as a container for your model versions. Each time you train a model, a new version is created. For details on how to create and train a model using the AWS Console see [Step 3: Create model](part-a.md#step-3-create-new-ml-model).

Each model has a corresponding model score variable. Amazon Fraud Detector creates this variable on your behalf when you create a model. You can use this variable in your rule expressions to interpret your model scores during a fraud evaluation.

## Train and deploy a model using the AWS SDK for Python (Boto3)


A model version is created by calling the `CreateModel` and `CreateModelVersion` operations. `CreateModel` initiates the model, which acts as a container for your model versions. `CreateModelVersion` starts the training process, which results in a specific version of the model. A new version of the solution is created each time you call `CreateModelVersion`.

The following example shows a sample request for the `CreateModel` API. This example creates *Online Fraud Insights* model type and assumes you have created an event type `sample_registration`. For additional details about creating an event type, see [Create an event type](create-event-type.md).

```
import boto3
fraudDetector = boto3.client('frauddetector')

fraudDetector.create_model (
modelId = 'sample_fraud_detection_model',
eventTypeName = 'sample_registration',
modelType = 'ONLINE_FRAUD_INSIGHTS')
```

Train your first version using the [CreateModelVersion](https://docs.aws.amazon.com//frauddetector/latest/api/API_CreateModelVersion.html) API. For the `TrainingDataSource` and `ExternalEventsDetail` specify the source and Amazon S3 location of the training data set. For the `TrainingDataSchema` specify how Amazon Fraud Detector should interpret the training data, specifically which event variables to include and how to classify the event labels. By default, Amazon Fraud Detector ignores the unlabeled events. This example code uses `AUTO` for `unlabeledEventsTreatment` to specify that Amazon Fraud Detector decides how to use the unlabeled events.

```
import boto3
fraudDetector = boto3.client('frauddetector')

fraudDetector.create_model_version (
modelId = 'sample_fraud_detection_model',
modelType = 'ONLINE_FRAUD_INSIGHTS',
trainingDataSource = 'EXTERNAL_EVENTS',
trainingDataSchema = {
    'modelVariables' : ['ip_address', 'email_address'],
    'labelSchema' : {
        'labelMapper' : {
            'FRAUD' : ['fraud'],
            'LEGIT' : ['legit']
        }
       unlabeledEventsTreatment = 'AUTO'
    }
}, 
externalEventsDetail = {
    'dataLocation' : 's3://bucket/file.csv',
    'dataAccessRoleArn' : 'role_arn'
}
)
```

A successful request will result in a new model version with status `TRAINING_IN_PROGRESS`. At any point during the training, you can cancel the training by calling `UpdateModelVersionStatus` and updating the status to `TRAINING_CANCELLED`. Once training is complete, the model version status will update to `TRAINING_COMPLETE`. You can review model performance using the Amazon Fraud Detector console or by calling `DescribeModelVersions`. For more information on how to interpret model scores and performance, see [Model scores](model-scores.md) and [Model performance metrics](training-performance-metrics.md).

 After reviewing the model performance, activate the model to make it available to use by Detectors in real-time fraud predictions. Amazon Fraud Detector will deploy the model in multiple availability zones for redundancy with auto-scaling turned on to ensure the model scales with the number of fraud predictions you are making. To activate the model, call the `UpdateModelVersionStatus` API and update the status to `ACTIVE`.

```
import boto3
fraudDetector = boto3.client('frauddetector')

fraudDetector.update_model_version_status (
modelId = 'sample_fraud_detection_model',
modelType = 'ONLINE_FRAUD_INSIGHTS',
modelVersionNumber = '1.00',
status = 'ACTIVE'
)
```

# Model scores


Amazon Fraud Detector generates model scores differently for different model types. 

For **Account Takeover Insights (ATI)** models, Amazon Fraud Detector uses only aggregated value (a value calculated by combining a set of raw variables) to generate the model score. A score of -1 is generated for the first event of a new entity, indicating an *unknown risk*. This is because for a new entity, the values used for calculating the aggregate will be zero or null. Account Takeeover Insights (ATI) model generates model scores between 0 and 1000 for all subsequent events for the same entity and for existing entities, where 0 indicates *low fraud risk* and 1000 indicates *high fraud risk*. For ATI models, the model scores are directly related to the challenge rate (CR). For example, a score of 500 corresponds to an estimated 5% challenge rate whereas a score of 900 corresponds to an estimated 0.1% challenge rate. 

For **Online Fraud Insights (OFI)** and **Transaction Fraud Insights (TFI)** models, Amazon Fraud Detector uses both aggregated value (a value calculated by combining a set of raw variables) and raw value (the value provided for the variable) to generate the model scores. The model scores can be between 0 and 1000, where 0 indicates *low fraud risk * and 1000 indicates *high fraud risk*. For the OFI and TFI models, the model scores are directly related to the false positive rate (FPR). For example, a score of 600 corresponds to an estimated 10% false positive rate whereas a score of 900 corresponds to an estimated 2% false positive rate. The following table provides details of how certain model scores correlate to estimated false positive rates.


| Model score | Estimated FPR | 
| --- | --- | 
|  975  |  0.50%  | 
|  950  |  1%  | 
|  900  |  2%  | 
|  860  |  3%  | 
|  775  |  5%  | 
|  700  |  7%  | 
|  600  |  10%  | 

# Model performance metrics


After model training is complete, Amazon Fraud Detector validates model performance using 15% of your data that was not used to train the model. You can expect your trained Amazon Fraud Detector model to have real-world fraud detection performance that is similar to the validation performance metrics.

As a business, you must balance between detecting more fraud, and adding more friction to legitimate customers. To assist in choosing the right balance, Amazon Fraud Detector provides the following tools to assess model performance:
+ **Score distribution chart** – A histogram of model score distributions assumes an example population of 100,000 events. The left Y axis represents the legitimate events and the right Y axis represents the fraud events. You can select a specific model threshold by clicking on the chart area. This will update the corresponding views in the confusion matrix and ROC chart.
+ **Confusion matrix** – Summarizes the model accuracy for a given score threshold by comparing model predictions versus actual results. Amazon Fraud Detector assumes an example population of 100,000 events. The distribution of fraud and legitimate events simulates the fraud rate in your businesses.
  + **True positives ** – The model predicts fraud and the event is actually fraud.
  + **False positives** – The model predicts fraud but the event is actually legitimate.
  + **True negatives** – The model predicts legitimate and the event is actually legitimate.
  + **False negatives** – The model predicts legitimate but the event is actually fraud.
  + **True positive rate (TPR)** – Percentage of total fraud the model detects. Also known as capture rate. 
  + **False positive rate (FPR)** – Percentage of total legitimate events that are incorrectly predicted as fraud.
+ **Receiver Operator Curve (ROC)** – Plots the true positive rate as a function of false positive rate over all possible model score thresholds. View this chart by choosing **Advanced Metrics**.
+ **Area under the curve (AUC)** – Summarizes TPR and FPR across all possible model score thresholds. A model with no predictive power has an AUC of 0.5, whereas a perfect model has a score of 1.0.
+ **Uncertainty range** – It shows the range of AUC expected from the model. Larger range (difference in upper and lower bound of AUC > 0.1) means higher model uncertainty. If the uncertainty range is large (>0.1), consider providing more labeled events and retrain the model.

**To use the model performance metrics**

1. Start with the **Score distribution** chart to review the distribution of model scores for your fraud and legitimate events. Ideally, you will have a clear separation between the fraud and legitimate events. This indicates the model can accurately identify which events are fraudulent and which are legitimate. Select a model threshold by clicking on the chart area. You can see how adjusting the model score threshold impacts your true positive and false positive rates.
**Note**  
The score distribution chart plots the fraud and legitimate events on two different Y axis. The left Y axis represents the legitimate events and the right Y axis represents the fraud events.

1. Review the **Confusion matrix**. Depending on your selected model score threshold, you can see the simulated impact based on a sample of 100,000 events. The distribution of fraud and legitimate events simulates the fraud rate in your businesses. Use this information to find the right balance between true positive rate and false positive rate.

1. For additional details, choose **Advanced Metrics**. Use the ROC chart to understand the relationship between true positive rate and false positive rate for any model score threshold. The ROC curve can help you fine-tune the tradeoff between true positive rate and false positive rate.
**Note**  
You can also review metrics in table form by choosing **Table**.  
The table view also shows the metric **Precision**. **Precision** is the percentage of fraud events correctly predicted as fraudulent as compared to all events predicted as fraudulent.

1. Use the performance metrics to determine the optimal model thresholds for your businesses based on your goals and fraud-detection use case. For example, if you plan to use the model to classify new account registrations as either high, medium, or low risk, you need to identify two threshold scores so you can draft three rule conditions as follows:
   + Scores > X are high risk
   + Scores < X but > Y are medium risk
   + Scores < Y are low risk

# Model variable importance


*Model variable importance* is a feature of Amazon Fraud Detector that ranks model variables within a model version. Each model variable is provided a value based on its relative importance to the overall performance of your model. The model variable with the highest value is more important to the model than the other model variables in the dataset for that model version, and is listed at the top by default. Likewise, the model variable with the lowest value is listed at the bottom by default and is least important compared to the other model variables. Using model variable importance values, you can gain insight into what inputs are driving your model’s performance. 

You can view model variable importance values for your trained model version in the Amazon Fraud Detector console or by using the [DescribeModelVersion](https://docs.aws.amazon.com/frauddetector/latest/api/API_DescribeModelVersions.html) API.

Model variable importance provides the following set of values for each [Variable](https://docs.aws.amazon.com/frauddetector/latest/api/API_TrainingDataSchema.html#FraudDetector-Type-TrainingDataSchema-modelVariables) used to train the [Model Version](https://docs.aws.amazon.com/frauddetector/latest/api/API_CreateModelVersion.html).
+ **Variable Type**: Type of variable (for example, IP address or Email). For more information, see [Variable types](variables.md#variable-types). For Account Takeover Insights (ATI) models, Amazon Fraud Detector provides variable importance value for both raw and aggregate variable type. Raw variable types are assigned to the variables that you provide. Aggregate variable type is assigned to a set of raw variables that Amazon Fraud Detector has combined to calculate an aggregated importance value.
+ **Variable Name**: Name of the event variable that was used to train the model version (for example, `ip_address`, `email_address`, `are_creadentials_valid`). For aggregated variable type, the names of all variables that were used to calculate the aggregated variable importance value are listed. 
+ **Variable Importance Value**: A number that represents the relative importance of the raw or aggregated variable to the model's performance. Typical range: 0–10

In the Amazon Fraud Detector console, the model variable importance values are displayed as follows for either an Online Fraud Insights (OFI) or an Transaction Fraud Insights (TFI) model. An Account Takeover Insight (ATI) model will provide aggregated variable importance values in addition to the raw variable's importance values. The visual chart makes it easy to see the relative importance between variables with the vertical dotted line providing reference to the importance value of the highest ranked variable. 

![\[Model variable importance chart.\]](http://docs.aws.amazon.com/frauddetector/latest/ug/images/hawksnest-console-mvi-pane.png)


Amazon Fraud Detector generates variable importance values for every Fraud Detector model version at no additional cost. 

**Important**  
Model versions that were created before *July 9, 2021* do not have variable importance values. You must train a new version of your model to generate the model variable importance values.

## Using model variable importance values


 You can use model variable importance values to gain insight into what is driving performance of your model up or down and which of variables contribute the most. And then tweak your model to improve overall performance. 

More specifically, to improve your model performance, examine the variable importance values against your domain knowledge and debug issues in the training data. For example, if Account Id was used as an input to the model and it is listed at the top, take a look at its variable importance value. If the variable importance value is significantly higher than the rest of the values, then your model might be overfitting on a specific fraud pattern (for example, all the fraud events are from the same Account Id). However, it might also be the case that there is a label leakage if the variable depends on the fraud labels. Depending on the outcome of your analysis based on your domain knowledge, you might want to remove the variable and train with a more diverse dataset, or keep the model as it is. 

Similarly, take a look at the variables ranked last. If the variable importance value is significantly lower than the rest of the values, then this model variable might not have any importance in training your model. You could consider removing the variable to train a simpler model version. If your model has few variables, such as only two variables, Amazon Fraud Detector still provides the variable importance values and rank the variables. However, the insights in this case will be limited. 

**Important**  
If you notice variables missing in the **Model variable importance** chart, it might be due to one of the following reasons. Consider modifying the variable in your dataset and retrain your model.   
The count of unique values for the variable in the training dataset is lower than 100.
Greater than 0.9 of values for the variable are missing from the training data-set.
You need to train a new model version every time that you want to adjust your model’s input variables.

## Evaluating model variable importance values


We recommend that you consider the following when you evaluate model variable importance values: 
+ Variable importance values must always be evaluated in combination with the domain knowledge.
+ Examine variable importance value of a variable relative to the variable importance value of the other variables within the model version. Do not consider variable importance value for a single variable independently.
+ Compare variable importance values of the variables within the same model version. Do not compare variable importance values of the same variables across model versions because the variable importance value of a variable in a model version might differ from the value of the same variable in a different model version. If you use the same variables and dataset to train different model versions, this does not necessarily generate the same variable importance values. 

## Viewing model variable importance ranking


After model training is complete, you can view model variable importance ranking of your trained model version in the Amazon Fraud Detector console or by using the [DescribeModelVersion](https://docs.aws.amazon.com/frauddetector/latest/api/API_DescribeModelVersions.html) API.

**To view the model variable importance ranking using console,**

1. Open the AWS Console and sign in to your account. Navigate to Amazon Fraud Detector.

1. In the left navigation pane, choose **Models**.

1. Choose your model and then your model version.

1. Make sure that the **Overview** tab is selected.

1. Scroll down to view the **Model variable importance** pane.

## Understanding how the model variable importance value is calculated


Upon completion of each model version training, Amazon Fraud Detector automatically generates model variable importance values and model’s performance metrics. For this, Amazon Fraud Detector uses SHapley Additive exPlanations ([SHAP](https://papers.nips.cc/paper/2017/file/8a20a8621978632d76c43dfd28b67767-Paper.pdf)). SHAP is essentially the average expected contribution of a model variable after all possible combinations of all model variables have been considered. 

SHAP first assigns contribution of each model variable for prediction of an event. Then, it aggregates these predictions to create a ranking of the variables at the model level. To assign contributions of each model variable for a prediction, SHAP considers differences in model outputs among all possible variable combinations. By including all possibilities of including or removing specific set of variables to generate a model output, SHAP can accurately access the importance of each model variable. This is particularly important when the model variables are highly correlated with one another.

ML models, in most cases, do not allow you to remove variables. You can instead replace a removed or missing variable in the model with the corresponding variable values from one or more baselines (for example, non-fraud events). Choosing proper baseline instances can be difficult, but Amazon Fraud Detector makes this easy by setting this baseline as the population average for you.

# Import a SageMaker AI model


You can optionally import SageMaker AI-hosted models to Amazon Fraud Detector. Similar to models, SageMaker AI models can be added to detectors and generate fraud predictions using the `GetEventPrediction` API. As part of the `GetEventPrediction` request, Amazon Fraud Detector will invoke your SageMaker AI endpoint and pass the results to your rules.

You can configure Amazon Fraud Detector to use the event variables sent as part of the `GetEventPrediction` request. If you choose to use event variables, you must provide an input template. Amazon Fraud Detector will use this template to transform your event variables into the required input payload to invoke the SageMaker AI endpoint. Alternatively, you can configure your SageMaker AI model to use a byteBuffer that is sent as part of the `GetEventPrediction` request.

Amazon Fraud Detector supports importing SageMaker AI algorithms that use JSON or CSV input formats and JSON or CSV output formats. Examples of supported SageMaker AI algorithms include XGBoost, Linear Learner, and Random Cut Forest.

## Import a SageMaker AI model using the AWS SDK for Python (Boto3)


To import a SageMaker AI model, use the `PutExternalModel` API. The following example assumes the SageMaker AI endpoint `sagemaker-transaction-model` has been deployed, is `InService` status, and uses the XGBoost algorithm.

The input configuration specifies that will use the event variables to construct the model input (`useEventVariables` is set to `TRUE`). The input format is TEXT\$1CSV, given XGBoost requires a CSV input. The csvInputTemplate specifies how to construct the CSV input from the variables sent as part of the `GetEventPrediction` request. This example assumes you have created the variables `order_amt`, `prev_amt`, `hist_amt` and `payment_type`.

The output configuration specifies the response format of the SageMaker AI model, and maps the appropriate CSV index to the Amazon Fraud Detector variable `sagemaker_output_score`. Once configured, you can use the output variable in rules. 

**Note**  
The output from a SageMaker AI model must be mapped to a variable with source `EXTERNAL_MODEL_SCORE`. You cannot create these variables in the console using **Variables**. You must instead create them when you configure your model import.

```
import boto3
fraudDetector = boto3.client('frauddetector')

fraudDetector.put_external_model (
modelSource = 'SAGEMAKER',
modelEndpoint = 'sagemaker-transaction-model',
invokeModelEndpointRoleArn = 'your_SagemakerExecutionRole_arn',
inputConfiguration = {
    'useEventVariables' : True,
    'eventTypeName' : 'sample_transaction',
    'format' : 'TEXT_CSV',
    'csvInputTemplate' : '{{order_amt}}, {{prev_amt}}, {{hist_amt}}, {{payment_type}}'
},

outputConfiguration = {
    'format' : 'TEXT_CSV',
    'csvIndexToVariableMap' : {
        '0' : 'sagemaker_output_score'
    }
},
    
modelEndpointStatus = 'ASSOCIATED'
)
```

# Delete a model or model version


You can delete models and model versions in Amazon Fraud Detector, provided that they are not associated with a detector version. When you delete a model, Amazon Fraud Detector permanently deletes that model and the data is no longer stored in Amazon Fraud Detector.

 You can also remove Amazon SageMaker AI models if they are not associated with a detector version. Removing a SageMaker AI model disconnects it from Amazon Fraud Detector, but the model remains available in SageMaker AI.

**To delete a model version**

You can only delete model versions that are in the `Ready to deploy` status. To change a model version from `ACTIVE` to `Ready to deploy` status, undeploy the model version.

1. Sign in to the AWS Management Console and open the Amazon Fraud Detector console at [https://console.aws.amazon.com/frauddetector](https://console.aws.amazon.com/frauddetector).

1. In the left navigation pane of the Amazon Fraud Detector console, choose **Models**.

1. Choose the model that contains the model version you want to delete.

1. Choose the model version that you want to delete.

1. Choose **Actions**, and then choose **Delete**.

1. Enter the model version name, and then choose **Delete model version**.

**To undeploy a model version**

You can't undeploy a model version that is in use by any detector version (`ACTIVE`, `INACTIVE`, `DRAFT`). Therefore, to undeploy a model version that is in use by a detector version, first remove the model version from the detector version. 

1. In the left navigation pane of the Amazon Fraud Detector console, choose **Models**.

1. Choose the model that contains the model version you want to undeploy.

1. Choose the model version that you want to delete.

1. Choose **Actions**, and then choose **Undeploy model version**.

**To delete a model**

Before deleting a model, you must first delete all model versions and are associated with the model.

1. In the left navigation pane of the Amazon Fraud Detector console, choose **Models**.

1. Choose the model that you want to delete.

1. Choose **Actions**, and then choose **Delete**.

1. Enter the model name, and then choose **Delete model**.

**To remove an Amazon SageMaker AI model**

1. In the left navigation pane of the Amazon Fraud Detector console, choose **Models**.

1. Choose the SageMaker AI model that you want to remove.

1. Choose **Actions**, and then choose **Remove model**.

1. Enter the model name and then choose **Remove SageMaker AI model**.