

# Storage
<a name="storage"></a>

 Neptune supports dictionary garbage collection (GC) for property graph data, which can be enabled via the `neptune_lab_mode` [parameter](parameters.md) when `neptune_streams` is not active. When activated, this background job cleans up unused dictionary entries, potentially reducing the rate of data growth. The feature can run in two modes: soft\$1delete (marking entries as deleted without explicit removal) and enabled (explicitly deleting entries). The GC process can impact system performance by contending with query threads for resources like CPU and buffer cache, and can run with a maximum concurrency of 16 threads. 

 Neptune also supports inline server-generated edge IDs, which can be enabled through a configuration [parameter](parameters.md) when neptune\$1streams is not active. When this feature is enabled, the server generates unique inlined IDs for edges that do not have a user-defined ID, using a reserved prefix of "neptune\$1reserved". These inlined IDs are not stored in the dictionary, which can improve storage efficiency. 

**Topics**
+ [Neptune dictionary garbage collection](storage-gc.md)
+ [Neptune inlined server-generated edge ID](storage-edge-id.md)

# Neptune dictionary garbage collection
<a name="storage-gc"></a>

 Neptune supports dictionary garbage collection (GC) which can be enabled via the `neptune_lab_mode` [parameter](parameters.md) for property graph data. It can be enabled for clusters with only property graph data when `neptune_streams` is not enabled. The feature is automatically disabled if `neptune_streams` is enabled or there is any unexpired `neptune_streams` data. The feature requires a writer instance reboot to activate. This feature is available from engine release [1.4.3.0](https://docs.aws.amazon.com/releases/release-1.4.3.0.xml). 

 When enabled, the unused dictionary entries are cleaned up by a background job. It does not reduce `VolumeBytesUsed`, instead it frees up space in the index for new inserts. The rate of growth in `VolumeBytesUsed` is likely to be less when dictionary GC is enabled relative to when it is not. 

 Dictionary garbage collection runs in the background and scans all graph and dictionary data to find terms that are not in use. A new run is triggered on start up once approximately 6% of the data has changed. GC competes with query threads for server resources such as CPU, buffer cache, undo log generation, and write I/O operations, potentially reducing query throughput. Since GC scans data that is not actively touched by queries, it can impact the buffer cache on the writer node. The cluster could see additional write I/O operations and have more undo logs to purge as GC performs new deletes, which may also result in higher values for the `UndoLogListSize` metric. 

 GC can be run in two modes, `soft_delete` and `enabled`. When run in the `soft_delete` mode, unused dictionary entries are marked deleted (soft\$1delete) but are not explicitly deleted. This mode could also be used to understand performance characteristics after the background operation is turned on. When the `enabled` mode is used, entries are explicitly deleted ('hard' delete). It is recommended to run GC in `soft_delete` mode for a period of time before switching to `enabled` mode. 

 Dictionary GC supports a maximum concurrency of 16 (on machines with 16 or more cores). It runs by default with a single thread, but it can be run with higher concurrency when enabled for the first time. Dictionary GC thread(s) run at equal priority with the query threads, and they contend with resources on the writer equally. 

 Dictionary GC can be enabled via the `neptune_lab_mode` [parameter](parameters.md) by setting the `DictionaryGCMode` key. It accepts three possible values: `disabled` (default), `soft_delete`, or `enabled`. For example, the following code sample would set the `DictionaryGCMode` to `soft_delete`: 

```
neptune_lab_mode = 'DictionaryGCMode=soft_delete'
```

 The concurrency [parameter](parameters.md), `DictionaryGCConcurrency`, is optional and can take a value between 1 and 16. If set to a higher value than the minimum of 16 and number of cores, the concurrency is capped at that value. 

```
neptune_lab_mode = 'DictionaryGCMode=soft_delete,DictionaryGCConcurrency=2'
```

 The dictionary GC job is enabled in the background after the server starts, once there is some data available. The engine status displays the current status of dictionary GC. The example output shown below shows that dictionary GC is in `soft_delete` mode and running with a concurrency of 2. If the background job is running, it could be actively scanning for unused dictionary entries and deleting them, or waiting for new set of deletes to trigger a new round of GC. 

```
{"status":"healthy",...,"labMode":{"ObjectIndex":"disabled","DictionaryGC":"{Mode=enabled,Concurrency=2}"},...}
```

 Dictionary GC is paused when any of these conditions are met: 
+  Active bulk load. 
+  Freeable memory is less than 15Gb. 
+  `UndoLogListSize` is higher than 1,000,000. 

# Neptune inlined server-generated edge ID
<a name="storage-edge-id"></a>

 Neptune supports inline Server-Generated Edge IDs. It can be enabled via the Neptune configuration [parameter](parameters.md) `neptune_enable_inline_server_generated_edge_id` when `neptune_streams` is not enabled. This feature is available for Gremlin queries starting with engine release [1.4.3.0](https://docs.aws.amazon.com/releases/release-1.4.3.0.xml), and will be available for OpenCypher queries in a future release. 

 Edge ID is a unique identifier for an edge. An edge ID can be provided when inserting an edge. If no ID is provided, the server generates and assigns a UUID based ID to the edge by default. Like the user-defined ID, the UUID-based server-generated ID is stored in the dictionary. 

 When the `neptune_enable_inline_server_generated_edge_id` feature is enabled, the server generates a unique inlined ID when no ID is provided in the query. The inlined edge IDs are not stored in the dictionary, improving the storage efficiency. The server-generated inlined IDs begin with the reserved prefix `neptune_reserved`. 

**Warning**  
 Neptune reserves the `'neptune_reserved'` prefix for server generated inlined IDs. An error will be shown for queries attempting to insert data with a user-defined ID that begins with the reserved prefix. 

 The inlined server-generated edge ID feature can be enabled by setting the cluster-level parameter `neptune_enable_inline_server_generated_edge_id` to `1`. A reboot of the instance is required. The following example enables the server-generated edge ID feature: 

```
"ParameterName=neptune_enable_inline_server_generated_edge_id,ParameterValue=1,ApplyMethod=pending-reboot"
```

 To verify if the feature is enabled, you can check the features in the engine status. This feature is automatically disabled if `neptune_streams` is enabled. The following example output shows the engine status for the enabled feature: 

```
"features":{"InlineServerGeneratedEdgeId":"enabled"}
```

 The following Gremlin example adds an edge without a user-defined ID when the inline server-generated edge ID feature is enabled: 

```
curl - X POST--url https: //<neptune-cluster-endpoint>:8182/gremlin/ --data '{"gremlin":"g.withSideEffect(\"Neptune#disablePushdownOptimization\", true).addV().property(id, \"a\").addV().property(id, \"b\").addE(\"el\").to(V(\"a\"))"}'
{
    "requestId": "b6b84605-53ad-4c04-baf1-7f0f31a3aeaf",
    "status": {
        "message": "",
        "code": 200,
        "attributes": {
            "@type": "g:Map",
            "@value": []
        }
    },
    "result": {
        "data": {
            "@type": "g:List",
            "@value": [{
                "@type": "g:Edge",
                "@value": {
                    "id": "neptune_reserved_231850767",
                    "label": "el",
                    "inVLabel": "vertex",
                    "outVLabel": "vertex",
                    "inV": "a",
                    "outV": "b"
                }
            }]
        },
        "meta": {
            "@type": "g:Map",
            "@value": []
        }
    }
}
```