Iceberg metadata management
When you create a feature group with the Iceberg table format, Amazon SageMaker Feature Store creates and manages the underlying Iceberg table on your behalf using default configuration values. You can configure Iceberg table properties at feature group creation, update properties on an existing feature group, and view the properties currently set on the table. These settings give you control over configuration parameters such as snapshot retention, metadata file management, and write behavior to manage the overall size and performance of your offline store table.
Only a subset of Iceberg table properties have been validated for compatibility with Feature Store. Configuring properties outside this supported set does not guarantee correct behavior. For the full list of supported properties, see Allowed Iceberg properties.
Prerequisite: Your feature group must have an offline store using the Iceberg table format.
Important
If non-allowed Iceberg properties are changed, Feature Store cannot guarantee continued compatibility and may lead to an inability to write to the offline store.
IcebergProperties type
The IcebergProperties type provides a validated wrapper for Iceberg property
configurations, ensuring all keys belong to the allowed set and preventing duplicate
entries.
class IcebergProperties(Base): """Configuration for Iceberg table properties in a Feature Group offline store.""" properties: Optional[Dict[str, str]] = None
Validating properties
Invalid and duplicate keys result in an error when passed to the create or update
function. You can optionally validate keys using the
validate_property_keys() method. This is helpful when adding or removing
properties from an existing IcebergProperties object.
iceberg_properties = IcebergProperties( # Validates on creation properties={ "write.target-file-size-bytes": "268435456", "write.delete.mode": "merge-on-read", } ) # Add non-allowed property iceberg_properties.properties.update({"write.delete.isolation-level": "Snapshot"}) # Validate again — throws error because of non-allowed property iceberg_properties.validate_property_keys()
Create a feature group with Iceberg properties
The FeatureGroupManager.create function accepts an
iceberg_properties parameter of type IcebergProperties. It creates
a feature group and waits for creation to complete before updating the Iceberg properties on
the underlying AWS Glue table.
Alternatively, you can create a FeatureGroup, call create, then
pass the feature group object to the FeatureGroupManager class and call
update to avoid blocking while the feature group finishes creating.
fg = FeatureGroupManager.create( # ...other parameters... offline_store_config=OfflineStoreConfig( s3_storage_config=S3StorageConfig(s3_uri="s3://my-bucket/features/"), table_format="Iceberg", # Must use Iceberg table format ), iceberg_properties=IcebergProperties( properties={ "write.target-file-size-bytes": "536870912", "history.expire.min-snapshots-to-keep": "3", } ), )
Update Iceberg properties on an existing feature group
The update function accepts an iceberg_properties parameter of
type IcebergProperties. It takes an already-created feature group, retrieves the
AWS Glue Data Catalog of the offline store, and sets the specified Iceberg properties.
fg = FeatureGroupManager.get(feature_group_name="my-feature-group") fg.update( iceberg_properties=IcebergProperties( properties={ "write.target-file-size-bytes": "268435456", "write.delete.mode": "merge-on-read", } ), )
View Iceberg properties on a feature group
The FeatureGroupManager.get function accepts an
include_iceberg_properties parameter. When set to True, it
retrieves the Iceberg properties that have been manually set and are part of the allowed list,
and adds them to the iceberg_properties field in the returned object.
This only returns set properties that are part of the allowed list. To get all AWS Glue table properties, use the AWS Glue API directly. If an allowed Iceberg property does not appear, it has not been explicitly set and uses its default value.
fg = FeatureGroupManager.get( feature_group_name="my-feature-group", include_iceberg_properties=True, ) print(fg.iceberg_properties.properties) # e.g. {"write.target-file-size-bytes": "536870912"}
Required permissions
Ensure both the AmazonSageMakerFeatureStoreAccess and the AmazonSageMakerFullAccess managed policies are attached to the IAM role you are using. Manage your policy based on your access pattern.
Allowed Iceberg properties
The following table lists the Iceberg table properties that have been validated for use
with Feature Store. For more information about these properties, see Table configuration
| Property | Default value | Description |
|---|---|---|
write.metadata.delete-after-commit.enabled |
false |
Controls whether to delete the oldest tracked version metadata files after each table commit. |
write.metadata.previous-versions-max |
100 |
The max number of previous version metadata files to track. |
history.expire.max-snapshot-age-ms |
432000000 (5 days) |
Default max age of snapshots to keep on the table and all of its branches while expiring snapshots. |
history.expire.min-snapshots-to-keep |
1 |
Default min number of snapshots to keep on the table and all of its branches while expiring snapshots. |
history.expire.max-ref-age-ms |
Long.MAX_VALUE (forever) |
For snapshot references except the main branch, default max age of
snapshot references to keep while expiring snapshots. The main branch
never expires. |
write.target-file-size-bytes |
536870912 (512 MB) |
Controls the size of files generated to target about this many bytes. |
write.delete.target-file-size-bytes |
67108864 (64 MB) |
Controls the size of delete files generated to target about this many bytes. |
write.delete.mode |
copy-on-write |
Mode used for delete commands: copy-on-write or
merge-on-read (v2 and above). |
write.update.mode |
copy-on-write |
Mode used for update commands: copy-on-write or
merge-on-read (v2 and above). |
write.delete.granularity |
partition |
Controls the granularity of generated delete files: partition or
file. |
write.parquet.row-group-size-bytes |
134217728 (128 MB) |
Parquet row group size. |
read.split.target-size |
134217728 (128 MB) |
Target size when combining data input splits. |
read.split.metadata-target-size |
33554432 (32 MB) |
Target size when combining metadata input splits. |
read.split.open-file-cost |
4194304 (4 MB) |
The estimated cost to open a file, used as a minimum weight when combining splits. |