Examples of using parameters within additionalParams for tuning model-training configuration
The following examples demonstrate how to utilize the "additionalParams" feature in property-graph and RDF data models to configure various aspects of the model training process for a Neptune ML application. The examples cover a wide range of functionality, including specifying default split rates for training/validation/test data, defining node classification, regression, and link prediction tasks, as well as configuring different feature types such as numerical buckets, text embeddings, datetime, and categorical data. These detailed configurations allow you to tailor the machine learning pipeline to your specific data and modeling requirements, unlocking the full potential of the Neptune ML capabilities.
Contents
Property-graph examples using additionalParams
Specifying a default split rate for model-training configuration
Specifying a node-classification task for model-training configuration
Specifying a multi-class node classification task for model-training configuration
Specifying a node regression task for model-training configuration
Specifying an edge-classification task for model-training configuration
Specifying a multi-class edge classification task for model-training configuration
Specifying an edge regression for model-training configuration
Specifying a link prediction task for model-training configuration
Property-graph examples using additionalParams
Specifying a default split rate for model-training configuration
In the following example, the split_rate parameter sets the default
split rate for model training. If no default split rate is specified, the training uses
a value of [0.9, 0.1, 0.0]. You can override the default value on a per-target basis
by specifying a split_rate for each target.
In the following example, the default split_rate field indicates
that a split rate of [0.7,0.1,0.2] should be used unless overridden on
a per-target basis:"
"additionalParams": { "neptune_ml": { "version": "v2.0", "split_rate": [0.7,0.1,0.2], "targets": [(...)], "features": [(...)] } }
Specifying a node-classification task for model-training configuration
To indicate which node property contains labeled examples for training purposes,
add a node classification element to the targets array, using "type" :
"classification". Add a split_rate field if you want to override
the default split rate.
In the following example, the node target indicates that the
genre property of each Movie node should be treated
as a node class label. The split_rate value overrides the default
split rate:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "Movie", "property": "genre", "type": "classification", "split_rate": [0.7,0.1,0.2] } ], "features": [(...)] } }
Specifying a multi-class node classification task for model-training configuration
To indicate which node property contains multiple labeled examples for training
purposes, add a node classification element to the targets array, using "type" :
"classification", and separator to specify a character that can be
used to split a target property value into multiple categorical values. Add a
split_rate field if you want to override the default split rate.
In the following example, the node target indicates that the
genre property of each Movie node should be treated
as a node class label. The separator field indicates that each
genre property contains multiple semicolon-separated values:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "Movie", "property": "genre", "type": "classification", "separator": ";" } ], "features": [(...)] } }
Specifying a node regression task for model-training configuration
To indicate which node property contains labeled regressions for training purposes,
add a node regression element to the targets array, using "type" : "regression".
Add a split_rate field if you want to override the default split rate.
The following node target indicates that the rating
property of each Movie node should be treated as a node regression label:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "Movie", "property": "rating", "type" : "regression", "split_rate": [0.7,0.1,0.2] } ], "features": [...] } }
Specifying an edge-classification task for model-training configuration
To indicate which edge property contains labeled examples for training purposes,
add an edge element to the targets array, using "type" : "regression".
Add a split_rate field if you want to override the default split rate.
The following edge target indicates that the metAtLocation
property of each knows edge should be treated as an edge class label:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Person", "knows", "Person"], "property": "metAtLocation", "type": "classification" } ], "features": [(...)] } }
Specifying a multi-class edge classification task for model-training configuration
To indicate which edge property contains multiple labeled examples for training purposes,
add an edge element to the targets array, using "type" : "classification",
and a separator field to specify a character used to split a target property
value into multiple categorical values. Add a split_rate field if you want to
override the default split rate.
The following edge target indicates that the sentiment
property of each repliedTo edge should be treated as an edge class label.
The separator field indicates that each sentiment property contains multile comma-separated
values:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Person", "repliedTo", "Message"], "property": "sentiment", "type": "classification", "separator": "," } ], "features": [(...)] } }
Specifying an edge regression for model-training configuration
To indicate which edge property contains labeled regression examples for training
purposes, add an edge element to the targets array, using
"type" : "regression". Add a split_rate field if you want
to override the default split rate.
The following edge target indicates that the rating
property of each reviewed edge should be treated as an edge regression:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Person", "reviewed", "Movie"], "property": "rating", "type" : "regression" } ], "features": [(...)] } }
Specifying a link prediction task for model-training configuration
To indicate which edges should be used for link prediction training purposes, add
an edge element to the targets array using "type" : "link_prediction".
Add a split_rate field if you want to override the default split rate.
The following edge target indicates that cites edges
should be used for link prediction:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "edge": ["Article", "cites", "Article"], "type" : "link_prediction" } ], "features": [(...)] } }
Specifying a numerical bucket feature
You can specify a numerical data feature for a node property by adding
"type": "bucket_numerical" to the features array.
The following node feature indicates that the age
property of each Person node should be treated as a numerical
bucket feature:
"additionalParams": { "neptune_ml": { "targets": [...], "features": [ { "node": "Person", "property": "age", "type": "bucket_numerical", "range": [1, 100], "bucket_cnt": 5, "slide_window_size": 3, "imputer": "median" } ] } }
Specifying a Word2Vec feature
You can specify a Word2Vec feature for a node property by adding
"type": "text_word2vec" to the features array.
The following node feature indicates that the description
property of each Movie node should be treated as a Word2Vec
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [...], "features": [ { "node": "Movie", "property": "description", "type": "text_word2vec", "language": "en_core_web_lg" } ] } }
Specifying a FastText feature
You can specify a FastText feature for a node property by adding
"type": "text_fasttext" to the features array. The
language field is required, and must specify one of the following
languages codes:
en(English)zh(Chinese)hi(Hindi)es(Spanish)fr(French)
Note that the text_fasttext encoding cannot handle more than
one language at a time in a feature.
The following node feature indicates that the French description
property of each Movie node should be treated as a FastText
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [...], "features": [ { "node": "Movie", "property": "description", "type": "text_fasttext", "language": "fr", "max_length": 1024 } ] } }
Specifying a Sentence BERT feature
You can specify a Sentence BERT feature for a node property by adding
"type": "text_sbert" to the features array. You don't need
to specify the language, since the method automatically encodes text features using
a multilingual language model.
The following node feature indicates that the description
property of each Movie node should be treated as a Sentence BERT
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [...], "features": [ { "node": "Movie", "property": "description", "type": "text_sbert128", } ] } }
Specifying a TF-IDF feature
You can specify a TF-IDF feature for a node property by adding
"type": "text_tfidf" to the features array.
The following node feature indicates that the bio
property of each Person node should be treated as a TF-IDF
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [...], "features": [ { "node": "Movie", "property": "bio", "type": "text_tfidf", "ngram_range": [1, 2], "min_df": 5, "max_features": 1000 } ] } }
Specifying a datetime feature
The export process automatically infers datetime features for date
properties. However, if you want to limit the datetime_parts used for
a datetime feature, or override a feature specification so that a property
that would normally be treated as an auto feature is explicitly treated as a
datetime feature, you can do so by adding a "type": "datetime"
to the features array.
The following node feature indicates that the createdAt
property of each Post node should be treated as a datetime
feature:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [...], "features": [ { "node": "Post", "property": "createdAt", "type": "datetime", "datetime_parts": ["month", "weekday", "hour"] } ] } }
Specifying a category feature
The export process automatically infers auto features for string
properties and numeric properties containing multiples values. For numeric properties
containing single values, it infers numerical features. For date
properties it infers datetime features.
If you want to override a feature specification so that a property is treated
as a categorical feature, add a "type": "category" to the features array.
If the property contains multiple values, include a separator field.
For example:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [...], "features": [ { "node": "Post", "property": "tag", "type": "category", "separator": "|" } ] } }
Specifying a numerical feature
The export process automatically infers auto features for string
properties and numeric properties containing multiples values. For numeric properties
containing single values, it infers numerical features. For date
properties it infers datetime features.
If you want to override a feature specification so that a property is treated as a
numerical feature, add "type": "numerical" to the features array.
If the property contains multiple values, include a separator field.
For example:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [...], "features": [ { "node": "Recording", "property": "duration", "type": "numerical", "separator": "," } ] } }
Specifying an auto feature
The export process automatically infers auto features for string
properties and numeric properties containing multiples values. For numeric properties
containing single values, it infers numerical features. For date
properties it infers datetime features.
If you want to override a feature specification so that a property is treated
as an auto feature, add "type": "auto" to the features array.
If the property contains multiple values, include a separator field.
For example:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [...], "features": [ { "node": "User", "property": "role", "type": "auto", "separator": "," } ] } }
RDF examples using additionalParams
Specifying a default split rate for model-training configuration
In the following example, the split_rate parameter sets the default
split rate for model training. If no default split rate is specified, the training uses
a value of [0.9, 0.1, 0.0]. You can override the default value on a per-target basis
by specifying a split_rate for each target.
In the following example, the default split_rate field indicates
that a split rate of [0.7,0.1,0.2] should be used unless overridden on
a per-target basis:"
"additionalParams": { "neptune_ml": { "version": "v2.0", "split_rate": [0.7,0.1,0.2], "targets": [(...)] } }
Specifying a node-classification task for model-training configuration
To indicate which node property contains labeled examples for training purposes,
add a node classification element to the targets array, using "type" :
"classification". Add a node field to indicate the node type of target nodes.
Add a predicate field to define which literal data is used as the target
node feature of the target node. Add a split_rate field if you want to
override the default split rate.
In the following example, the node target indicates that the
genre property of each Movie node should be treated
as a node class label. The split_rate value overrides the default
split rate:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "http://aws.amazon.com/neptune/csv2rdf/class/Movie", "predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/genre", "type": "classification", "split_rate": [0.7,0.1,0.2] } ] } }
Specifying a node regression task for model-training configuration
To indicate which node property contains labeled regressions for training purposes,
add a node regression element to the targets array, using "type" : "regression".
Add a node field to indicate the node type of target nodes. Add a
predicate field to define which literal data is used as the target node
feature of the target node. Add a split_rate field if you want to override
the default split rate.
The following node target indicates that the rating
property of each Movie node should be treated as a node regression label:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "node": "http://aws.amazon.com/neptune/csv2rdf/class/Movie", "predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/rating", "type": "regression", "split_rate": [0.7,0.1,0.2] } ] } }
Specifying a link prediction task for particular edges
To indicate which edges should be used for link prediction training purposes, add
an edge element to the targets array using "type" : "link_prediction".
Add subject, predicate and object fields to
specify the edge type. Add a split_rate field if you want to override
the default split rate.
The following edge target indicates that directed edges
that connect Directors to Movies should be used for link
prediction:
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "subject": "http://aws.amazon.com/neptune/csv2rdf/class/Director", "predicate": "http://aws.amazon.com/neptune/csv2rdf/datatypeProperty/directed", "object": "http://aws.amazon.com/neptune/csv2rdf/class/Movie", "type" : "link_prediction" } ] } }
Specifying a link prediction task for all edges
To indicate that all edges should be used for link prediction training purposes,
add an edge element to the targets array using "type" :
"link_prediction". Do not add subject, predicate, or
object fields. Add a split_rate field if you want to override
the default split rate.
"additionalParams": { "neptune_ml": { "version": "v2.0", "targets": [ { "type" : "link_prediction" } ] } }