

# IP Insights Data Formats
<a name="ip-insights-data-formats"></a>

This section provides examples of the available input and output data formats used by the IP Insights algorithm during training and inference.

**Topics**
+ [IP Insights Training Data Formats](ip-insights-training-data-formats.md)
+ [IP Insights Inference Data Formats](ip-insights-inference-data-formats.md)

# IP Insights Training Data Formats
<a name="ip-insights-training-data-formats"></a>

The following are the available data input formats for the IP Insights algorithm. Amazon SageMaker AI built-in algorithms adhere to the common input training format described in [Common Data Formats for Training](cdf-training.md). However, the SageMaker AI IP Insights algorithm currently supports only the CSV data input format.

## IP Insights Training Data Input Formats
<a name="ip-insights-training-input-format-requests"></a>

### INPUT: CSV
<a name="ip-insights-input-csv"></a>

The CSV file must have two columns. The first column is an opaque string that corresponds to an entity's unique identifier. The second column is the IPv4 address of the entity's access event in decimal-dot notation. 

content-type: text/csv

```
entity_id_1, 192.168.1.2
entity_id_2, 10.10.1.2
```

# IP Insights Inference Data Formats
<a name="ip-insights-inference-data-formats"></a>

The following are the available input and output formats for the IP Insights algorithm. Amazon SageMaker AI built-in algorithms adhere to the common input inference format described in [Common data formats for inference](cdf-inference.md). However, the SageMaker AI IP Insights algorithm does not currently support RecordIO format.

## IP Insights Input Request Formats
<a name="ip-insights-input-format-requests"></a>

### INPUT: CSV Format
<a name="ip-insights-input-csv-format"></a>

The CSV file must have two columns. The first column is an opaque string that corresponds to an entity's unique identifier. The second column is the IPv4 address of the entity's access event in decimal-dot notation. 

content-type: text/csv

```
entity_id_1, 192.168.1.2
entity_id_2, 10.10.1.2
```

### INPUT: JSON Format
<a name="ip-insights-input-json"></a>

JSON data can be provided in different formats. IP Insights follows the common SageMaker AI formats. For more information about inference formats, see [Common data formats for inference](cdf-inference.md).

content-type: application/json

```
{
  "instances": [
    {"data": {"features": {"values": ["entity_id_1", "192.168.1.2"]}}},
    {"features": ["entity_id_2", "10.10.1.2"]}
  ]
}
```

### INPUT: JSONLINES Format
<a name="ip-insights-input-jsonlines"></a>

The JSON Lines content type is useful for running batch transform jobs. For more information on SageMaker AI inference formats, see [Common data formats for inference](cdf-inference.md). For more information on running batch transform jobs, see [Batch transform for inference with Amazon SageMaker AI](batch-transform.md).

content-type: application/jsonlines

```
{"data": {"features": {"values": ["entity_id_1", "192.168.1.2"]}}},
{"features": ["entity_id_2", "10.10.1.2"]}]
```

## IP Insights Output Response Formats
<a name="ip-insights-ouput-format-response"></a>

### OUTPUT: JSON Response Format
<a name="ip-insights-output-json"></a>

The default output of the SageMaker AI IP Insights algorithm is the `dot_product` between the input entity and IP address. The dot\$1product signifies how compatible the model considers the entity and IP address. The `dot_product` is unbounded. To make predictions about whether an event is anomalous, you need to set a threshold based on your defined distribution. For information about how to use the `dot_product` for anomaly detection, see the [An Introduction to the SageMaker AIIP Insights Algorithm](https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/ipinsights_login/ipinsights-tutorial.html).

accept: application/json

```
{
  "predictions": [
    {"dot_product": 0.0},
    {"dot_product": 2.0}
  ]
}
```

Advanced users can access the model's learned entity and IP embeddings by providing the additional content-type parameter `verbose=True` to the Accept heading. You can use the `entity_embedding` and `ip_embedding` for debugging, visualizing, and understanding the model. Additionally, you can use these embeddings in other machine learning techniques, such as classification or clustering.

accept: application/json;verbose=True

```
{
  "predictions": [
    {
        "dot_product": 0.0,
        "entity_embedding": [1.0, 0.0, 0.0],
        "ip_embedding": [0.0, 1.0, 0.0]
    },
    {
        "dot_product": 2.0,
        "entity_embedding": [1.0, 0.0, 1.0],
        "ip_embedding": [1.0, 0.0, 1.0]
    }
  ]
}
```

### OUTPUT: JSONLINES Response Format
<a name="ip-insights-jsonlines"></a>

accept: application/jsonlines 

```
{"dot_product": 0.0}
{"dot_product": 2.0}
```

accept: application/jsonlines; verbose=True 

```
{"dot_product": 0.0, "entity_embedding": [1.0, 0.0, 0.0], "ip_embedding": [0.0, 1.0, 0.0]}
{"dot_product": 2.0, "entity_embedding": [1.0, 0.0, 1.0], "ip_embedding": [1.0, 0.0, 1.0]}
```