

# Custom models in Neptune ML
Custom models

**Note**  
Neptune ML custom model support relies on an older version of Python 3. To create and run custom GNN models with up-to-date dependencies use [GraphStorm on SageMaker](https://graphstorm.readthedocs.io/en/v0.3.1/cli/model-training-inference/distributed/sagemaker.html).  
[Real-time inductive inference](machine-learning-overview-evolving-data.md#inductive-vs-transductive-inference) is not currently supported for custom models.

Neptune ML lets you define your own custom model implementations using Python. You can train and deploy custom models using Neptune ML infrastructure very much as you do for the built-in models, and use them to obtain predictions through graph queries.

You can start implementing a custom model of your own in Python by following the [Neptune ML toolkit examples](https://github.com/awslabs/neptuneml-toolkit/tree/main/examples/custom-models/), and by using the model components provided in the Neptune ML toolkit. The following sections provide more details.

**Contents**
+ [

# Overview of custom models in Neptune ML
](machine-learning-custom-model-overview.md)
  + [

## When to use a custom model in Neptune ML
](machine-learning-custom-model-overview.md#machine-learning-custom-models-when-to-use)
  + [

## Workflow for developing and using a custom model in Neptune ML
](machine-learning-custom-model-overview.md#machine-learning-custom-model-workflow)
+ [

# Custom model development in Neptune ML
](machine-learning-custom-model-development.md)
  + [

## Custom model training script development in Neptune ML
](machine-learning-custom-model-development.md#machine-learning-custom-model-training-script)
  + [

## Custom model transform script development in Neptune ML
](machine-learning-custom-model-development.md#machine-learning-custom-model-transform-script)
  + [

## Custom `model-hpo-configuration.json` file in Neptune ML
](machine-learning-custom-model-development.md#machine-learning-custom-model-hpo-configuration-file)
  + [

## Local testing of your custom model implementation in Neptune ML
](machine-learning-custom-model-development.md#machine-learning-custom-model-testing)

# Overview of custom models in Neptune ML
Custom model overview

## When to use a custom model in Neptune ML
When to use a custom model

Neptune ML's built-in models handle all the standard tasks supported by Neptune ML, but there may be cases where you want to have more granular control over the model for a particular task, or need to customize the model training process. For example, a custom model is appropriate in the following situations:
+ Feature encoding for text features of very large text models need to be run on GPU.
+ You want to use your own custom Graph Neural Network (GNN) model developed in Deep Graph Library (DGL).
+ You want to use tabular models or ensemble models for node classification and regression.

## Workflow for developing and using a custom model in Neptune ML
Custom model workflow

Custom model support in Neptune ML is designed to integrate seamlessly into existing Neptune ML workflows. It works by running custom code in your source module on Neptune ML's infrastructure to train the model. Just as is the case for a built-in mode, Neptune ML automatically launches a SageMaker AI HyperParameter tuning job and selects the best model according to the evaluation metric. It then uses the implementation provided in your source module to generate model artifacts for deployment.

Data export, training configuration, and data preprocessing is the same for a custom model as for a built-in one.

After data preprocessing is when you can iteratively and interactively develop and test your custom model implementation using Python. When your model is production-ready, you can upload the resulting Python module to Amazon S3 like this:

```
aws s3 cp --recursive (source path to module) s3://(bucket name)/(destination path for your module)
```

Then, you can use the normal [default](machine-learning-overview.md#machine-learning-overview-starting-workflow) or the [incremental](machine-learning-overview-evolving-data-incremental.md#machine-learning-overview-incremental) data workflow to deploy the model to production, with a few differences.

For model training using a custom model, you must provide a `customModelTrainingParameters` JSON object to the Neptune ML model training API to ensure that your custom code is used. The fields in the `customModelTrainingParameters` object are as follows:
+ **`sourceS3DirectoryPath`**   –   (*Required*) The path to the Amazon S3 location where the Python module implementing your model is located. This must point to a valid existing Amazon S3 location that contains, at a minimum, a training script, a transform script, and a `model-hpo-configuration.json` file.
+ **`trainingEntryPointScript`**   –   (*Optional*) The name of the entry point in your module of a script that performs model training and takes hyperparameters as command-line arguments, including fixed hyperparameters.

  *Default*: `training.py`.
+ **`transformEntryPointScript`**   –   (*Optional*) The name of the entry point in your module of a script that should be run after the best model from the hyperparameter search has been identified, to compute the model artifacts necessary for model deployment. It should be able to run with no command-line arguments.

  *Default*: `transform.py`.

For example:

------
#### [ AWS CLI ]

```
aws neptunedata start-ml-model-training-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --id "(a unique model-training job ID)" \
  --data-processing-job-id "(the data-processing job-id of a completed job)" \
  --train-model-s3-location "s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer" \
  --model-name "custom" \
  --custom-model-training-parameters '{
    "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
    "trainingEntryPointScript": "(your training script entry-point name in the Python module)",
    "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
  }'
```

For more information, see [start-ml-model-training-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/start-ml-model-training-job.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_ml_model_training_job(
    id='(a unique model-training job ID)',
    dataProcessingJobId='(the data-processing job-id of a completed job)',
    trainModelS3Location='s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer',
    modelName='custom',
    customModelTrainingParameters={
        'sourceS3DirectoryPath': 's3://(your Amazon S3 bucket)/(path to your Python module)',
        'trainingEntryPointScript': '(your training script entry-point name in the Python module)',
        'transformEntryPointScript': '(your transform script entry-point name in the Python module)'
    }
)

print(response)
```

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/ml/modeltraining \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer",
        "modelName": "custom",
        "customModelTrainingParameters" : {
          "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
          "trainingEntryPointScript": "(your training script entry-point name in the Python module)",
          "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
        }
      }'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl \
  -X POST https://your-neptune-endpoint:port/ml/modeltraining \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-training job ID)",
        "dataProcessingJobId" : "(the data-processing job-id of a completed job)",
        "trainModelS3Location" : "s3://(your Amazon S3 bucket)/neptune-model-graph-autotrainer",
        "modelName": "custom",
        "customModelTrainingParameters" : {
          "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
          "trainingEntryPointScript": "(your training script entry-point name in the Python module)",
          "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
        }
      }'
```

------

Similarly, to enable a custom model transform, you must provide a `customModelTransformParameters` JSON object to the Neptune ML model transform API, with field values that are compatible with the saved model parameters from the training job. The `customModelTransformParameters` object contains these fields:
+ **`sourceS3DirectoryPath`**   –   (*Required*) The path to the Amazon S3 location where the Python module implementing your model is located. This must point to a valid existing Amazon S3 location that contains, at a minimum, a training script, a transform script, and a `model-hpo-configuration.json` file.
+ **`transformEntryPointScript`**   –   (*Optional*) The name of the entry point in your module of a script that should be run after the best model from the hyperparameter search has been identified, to compute the model artifacts necessary for model deployment. It should be able to run with no command-line arguments.

  *Default*: `transform.py`.

For example:

------
#### [ AWS CLI ]

```
aws neptunedata start-ml-model-transform-job \
  --endpoint-url https://your-neptune-endpoint:port \
  --id "(a unique model-transform job ID)" \
  --training-job-name "(name of a completed SageMaker training job)" \
  --model-transform-output-s3-location "s3://(your Amazon S3 bucket)/neptune-model-transform/" \
  --custom-model-transform-parameters '{
    "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
    "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
  }'
```

For more information, see [start-ml-model-transform-job](https://docs.aws.amazon.com/cli/latest/reference/neptunedata/start-ml-model-transform-job.html) in the AWS CLI Command Reference.

------
#### [ SDK ]

```
import boto3
from botocore.config import Config

client = boto3.client(
    'neptunedata',
    endpoint_url='https://your-neptune-endpoint:port',
    config=Config(read_timeout=None, retries={'total_max_attempts': 1})
)

response = client.start_ml_model_transform_job(
    id='(a unique model-transform job ID)',
    trainingJobName='(name of a completed SageMaker training job)',
    modelTransformOutputS3Location='s3://(your Amazon S3 bucket)/neptune-model-transform/',
    customModelTransformParameters={
        'sourceS3DirectoryPath': 's3://(your Amazon S3 bucket)/(path to your Python module)',
        'transformEntryPointScript': '(your transform script entry-point name in the Python module)'
    }
)

print(response)
```

------
#### [ awscurl ]

```
awscurl https://your-neptune-endpoint:port/ml/modeltransform \
  --region us-east-1 \
  --service neptune-db \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-transform job ID)",
        "trainingJobName" : "(name of a completed SageMaker training job)",
        "modelTransformOutputS3Location" : "s3://(your Amazon S3 bucket)/neptune-model-transform/",
        "customModelTransformParameters" : {
          "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
          "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
        }
      }'
```

**Note**  
This example assumes that your AWS credentials are configured in your environment. Replace *us-east-1* with the Region of your Neptune cluster.

------
#### [ curl ]

```
curl \
  -X POST https://your-neptune-endpoint:port/ml/modeltransform \
  -H 'Content-Type: application/json' \
  -d '{
        "id" : "(a unique model-transform job ID)",
        "trainingJobName" : "(name of a completed SageMaker training job)",
        "modelTransformOutputS3Location" : "s3://(your Amazon S3 bucket)/neptune-model-transform/",
        "customModelTransformParameters" : {
          "sourceS3DirectoryPath": "s3://(your Amazon S3 bucket)/(path to your Python module)",
          "transformEntryPointScript": "(your transform script entry-point name in the Python module)"
        }
      }'
```

------

# Custom model development in Neptune ML
Custom model development

A good way to start custom model development is by following [Neptune ML toolkit examples](https://github.com/awslabs/neptuneml-toolkit/tree/main/examples/custom-models/introduction) to structure and write your training module. The Neptune ML toolkit also implements modularized graph ML model components in the [modelzoo](https://github.com/awslabs/neptuneml-toolkit/tree/main/src/neptuneml_toolkit/modelzoo) that you can stack and use to create your custom model.

In addition, the toolkit provides utility functions that help you generate the necessary artifacts during model training and model transform. You can import this Python package in your custom implementation. Any functions or modules provided in the toolkit are also available in the Neptune ML training environment.

If your Python module has additional external dependencies, you can include these additional dependencies by creating a `requirements.txt` file in your module's directory. The packages listed in the `requirements.txt` file will then be installed before your training script is run.

At a minimum, the Python module that implements your custom model needs to contain the following:
+ A training script entry point
+ A transform script entry point
+ A `model-hpo-configuration.json` file

## Custom model training script development in Neptune ML
Training script

Your custom model training script should be an executable Python script like the Neptune ML toolkit's [https://github.com/awslabs/neptuneml-toolkit/blob/main/examples/custom-models/introduction/movie-lens-rgcn/node-class/src/train.py](https://github.com/awslabs/neptuneml-toolkit/blob/main/examples/custom-models/introduction/movie-lens-rgcn/node-class/src/train.py) example. It must accept hyperparameter names and values as command-line arguments. During model training, the hyperparameter names are obtained from the `model-hpo-configuration.json` file. The hyperparameter values either fall within the valid hyperparameter range if the hyperparameter is tunable, or take the default hyperparameter value if it is not tunable.

Your training script is run on a SageMaker AI training instance using a syntax like this:

```
python3 (script entry point) --(1st parameter) (1st value) --(2nd parameter) (2nd value) (...)
```

For all tasks, the Neptune ML AutoTrainer sends several required parameters to your training script in addition to the hyperparameters that you specify, and your script must be able to handle these additional parameters in order to work properly.

These additional required parameters vary somewhat by task:

**For node classification or node regression**
+ **`task`**   –   The task type used internally by Neptune ML. For node classification this is `node_class`, and for node regression it is `node_regression`.
+ **`model`**   –   The model name used internally by Neptune ML, which is `custom` in this case.
+ **`name`**   –   The name of the task used internally by Neptune ML, which is `node_class-custom` for node classification in this case, and `node_regression-custom` for node regression.
+ **`target_ntype`**   –   The name of the node type for classification or regression.
+ **`property`**   –   The name of the node property for classification or regression.

**For link prediction**
+ **`task`**   –   The task type used internally by Neptune ML. For link prediction, this is `link_predict`.
+ **`model`**   –   The model name used internally by Neptune ML, which is `custom` in this case.
+ **`name`**   –   The name of the task used internally by Neptune ML, which is `link_predict-custom` in this case.

**For edge classification or edge regression**
+ **`task`**   –   The task type used internally by Neptune ML. For edge classification this is `edge_class`, and for edge regression it is `edge_regression`.
+ **`model`**   –   The model name used internally by Neptune ML, which is `custom` in this case.
+ **`name`**   –   The name of the task used internally by Neptune ML, which is `edge_class-custom` for edge classification in this case, and `edge_regression-custom` for edge regression.
+ **`target_etype`**   –   The name of the edge type for classification or regression.
+ **`property`**   –   The name of the edge property for classification or regression.

Your script should save the model parameters, as well as any other artifacts that will be needed to at the end of training.

You can use Neptune ML toolkit utility functions to determine the location of the processed graph data, the location where the model parameters should be saved, and what GPU devices are available on the training instance. See the [train.py](https://github.com/awslabs/neptuneml-toolkit/blob/main/examples/custom-models/introduction/movie-lens-rgcn/node-class/src/train.py) sample training script for examples of how to use these utility functions.

## Custom model transform script development in Neptune ML
Transform script

A transform script is needed to take advantage of the Neptune ML [incremental workflow](machine-learning-overview-evolving-data-incremental.md#machine-learning-overview-incremental) for model inference on evolving graphs without retraining the model. Even if all the artifacts necessary for model deployment are generated by the training script, you still need to provide a transform script if you want to generate updated models without retraining the model.

**Note**  
[Real-time inductive inference](machine-learning-overview-evolving-data.md#inductive-vs-transductive-inference) is not currently supported for custom models.

Your custom model transform script should be an executable Python script like the Neptune ML toolkit's [transform.py](https://github.com/awslabs/neptuneml-toolkit/blob/main/examples/custom-models/introduction/movie-lens-rgcn/node-class/src/transform.py) example script. Because this script is invoked during model training with no command line arguments, any command line arguments that the script does accept must have defaults.

The script runs on a SageMaker AI training instance with a syntax like this:

```
python3 (your transform script entry point)
```

Your transform script will need various pieces of information, such as:
+ The location of the processed graph data.
+ The location where the model parameters are saved and where new model artifacts should be saved.
+ The devices available on the instance.
+ The hyperparameters that generated the best model.

These inputs are obtained using Neptune ML utility functions that your script can call. See the toolkit's sample [transform.py](https://github.com/awslabs/neptuneml-toolkit/blob/main/examples/custom-models/introduction/movie-lens-rgcn/node-class/src/transform.py) script for examples of how to do that.

The script should save the node embeddings, node ID mappings, and any other artifacts necessary for model deployment for each task. See the [model artifacts documentation](machine-learning-model-artifacts.md) for more information about the model artifacts required for different Neptune ML tasks.

## Custom `model-hpo-configuration.json` file in Neptune ML
model-hpo-configuration.json

The `model-hpo-configuration.json` file defines hyperparameters for your custom model. It is in the same [format](machine-learning-customizing-hyperparams.md) as the `model-hpo-configuration.json` file used with the Neptune ML built-in models, and takes precedence over the version that is auto-generated by Neptune ML and uploaded to the location of your processed data.

When you add a new hyperparameter to your model, you must also add an entry for the hyperparameter in this file so that the hyperparameter is passed to your training script.

You must provide a range for a hyperparameter if you want it to be tunable, and set it as a `tier-1`, `tier-2`, or `tier-3` param. The hyperparameter will be tuned if the total number of training jobs configured allow for tuning hyperparameters in its tier. For a non-tunable parameter, you must provide a default value and add the hyperparameter to the `fixed-param` section of the file. See the toolkit's sample [sample `model-hpo-configuration.json` file](https://github.com/awslabs/neptuneml-toolkit/blob/main/examples/custom-models/introduction/movie-lens-rgcn/node-class/src/model-hpo-configuration.json) for an example of how to do that.

You must also provide the metric definition that the SageMaker AI HyperParameter Optimization job will use to evaluate the candidate models trained. To do this, you add an `eval_metric` JSON object to the `model-hpo-configuration.json` file like this:

```
"eval_metric": {
  "tuning_objective": {
      "MetricName": "(metric_name)",
      "Type": "Maximize"
  },
  "metric_definitions": [
    {
      "Name": "(metric_name)",
      "Regex": "(metric regular expression)"
    }
  ]
},
```

The `metric_definitions` array in the `eval_metric` object lists metric definition objects for each metric that you want SageMaker AI to extract from the training instance. Each metric definition object has a `Name` key that lets you provide a name for the metric (such as "accuracy", "f1", and so on) The `Regex` key lets you provide a regular expression string that matches how that particular metric is printed in the training logs. See the [SageMaker AI HyperParameter Tuning page](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-define-metrics.html) for more details on how to define metrics.

The `tuning_objective` object in `eval_metric` then allows you to specify which of the metrics in `metric_definitions` should be used as the evaluation metric that serves as the objective metric for hyperparameter optimization. The value for the `MetricName` must match the value of a `Name` in one of the definitions in `metric_definitions`. The value for `Type` should be either "Maximize" or "Minimize" depending on whether the metric should be interpreted as greater-is-better (like "accuracy") or less-is-better (like "mean-squared-error".

Errors in this section of the `model-hpo-configuration.json` file can result in failures of the Neptune ML model training API job, because the SageMaker AI HyperParameter Tuning job will not be able to select the best model.

## Local testing of your custom model implementation in Neptune ML
Local testing

You can use the Neptune ML toolkit Conda environment to run your code locally in order to test and validate your model. If you're developing on a Neptune Notebook instance, then this Conda environment will be pre-installed on the Neptune Notebook instance. If you’re developing on a different instance, then you need to follow the [local setup instructions](https://github.com/awslabs/neptuneml-toolkit#local-installation) in the Neptune ML toolkit.

The Conda environment accurately reproduces the environment where your model will run when you call the [model training API](machine-learning-api-modeltraining.md). All of the example training scripts and transform scripts allow you to pass a command line `--local` flag to run the scripts in a local environment for easy debugging. This is a good practice while developing your own model because it allows you to interactively and iteratively test your model implementation. During model training in the Neptune ML production training environment, this parameter is omitted.