Delete records from your feature groups
You can use the Amazon SageMaker Feature Store API to delete records from your feature groups. A feature group is an object that contains your machine learning (ML) data, where the columns of your data are described by features and your data are contained in records. A record contains values for features that are associated with a specific record identifier.
There are two storage configurations for your feature groups: online store and offline store. The online store only keeps the record with the latest event time and is typically used for real-time lookup for ML inference. The offline store keeps all records and acts as a historical database and is typically used for feature exploration, ML training, and batch inference.
For more information on Feature Store concepts, see Ingestion diagrams.
There are two ways to delete records from your feature groups, and the behavior is different depending on the storage configuration. In the following topics we will describe how to soft and hard delete records from the online and offline stores and provide examples.
Delete records from the online store
You can soft or hard delete a record from the online store using the
    DeleteRecord API by using the DeletionMode request parameter to
   specify SoftDelete (default) or HardDelete. For more information on the
    DeleteRecord API, see DeleteRecord in the Amazon SageMaker API Reference.
With the online store:
- 
    When you soft delete (default), the record is no longer retrievable by GetRecord or BatchGetRecord and the feature column values are set to null, except for theRecordIdentiferandEventTimefeature values.
- 
    When you hard delete, the record is completely removed from the online store. 
In both cases Feature Store appends the deleted record marker to the OfflineStore. The
   deleted record marker is a record with the same RecordIdentifer as the original, but
   with is_deleted value set to True, EventTime set to the
   delete input EventTime, and other feature values set to null.
Note that the EventTime specified in DeleteRecord should be set
   later than the EventTime of the existing record in the OnlineStore for
   that same RecordIdentifer. If it is not, the deletion does not occur:
- 
    For SoftDelete, the existing (not deleted) record remains in theOnlineStore, though the delete record marker is still written to theOfflineStore.
- 
    HardDeletereturnsEventTime:400 ValidationExceptionto indicate that the delete operation failed. No delete record marker is written to theOfflineStore.
The following examples use the SDK for Python (Boto3) delete_record
- 
    Feature group name ( feature-group-name
- 
    Record identifier value as a string ( record-identifier-value
- 
    Deletion event time ( deletion-event-timeThe deletion event time should be later than the event time of the record you wish to delete. 
Online store soft delete example
For soft delete you will need use the DeleteRecord API and can use the default
     DeletionMode or set the DeletionMode to SoftDelete. 
import boto3 client = boto3.client('sagemaker-featurestore-runtime') client.delete_record( FeatureGroupName='feature-group-name', RecordIdentifierValueAsString='record-identifier-value', EventTime='deletion-event-time', TargetStores=[ 'OnlineStore', ], DeletionMode='SoftDelete' )
Online store hard delete example
For hard delete you will need use the DeleteRecord API and set the
     DeletionMode to HardDelete.
import boto3 client = boto3.client('sagemaker-featurestore-runtime') client.delete_record( FeatureGroupName='feature-group-name', RecordIdentifierValueAsString='record-identifier-value', EventTime='deletion-event-timestamp', TargetStores=[ 'OnlineStore', ], DeletionMode='HardDelete' )
Delete records from the offline store
With Amazon SageMaker Feature Store you can soft and hard delete a record from the OfflineStore
   Iceberg table format. With the OfflineStore Iceberg table format: 
- 
    When you soft delete a record the latest version of the Iceberg table file will not contain the record, but previous versions will still contain the record and can be accessed using time travel. For information on time travel, see Querying Iceberg table data and performing time travel in the Athena user guide. 
- 
    When you hard delete a record you are removing previous versions of the Iceberg table that contain the record. In this case you should specify which versions of the Iceberg table you wish to delete. 
Obtain your Iceberg table name
To soft and hard delete from your OfflineStore Iceberg table, you will need to
    obtain your Iceberg table name, iceberg-table-nameDisableGlueTableCreation = False (default). For more information on creating
    feature groups, see Get started with Amazon SageMaker Feature Store.
To obtain your iceberg-table-nameDescribeFeatureGroup API to obtain DataCatalogConfig. This contains the metadata of the Glue table which
    serves as data catalog for the OfflineStore. The TableName within the
     DataCatalogConfig is your
     iceberg-table-name
Amazon Athena offline store soft and hard delete example
The following instructions use Amazon Athena to soft delete then hard delete a record from the
     OfflineStore Iceberg table. This assumes that the record you intend to delete in
    your OfflineStore is a deleted record marker. For information on the deleted record
    marker in your OfflineStore, see Delete records from the online
    store. 
- 
     Obtain your Iceberg table name, iceberg-table-name
- 
     Run the DELETEcommand to soft delete the records on theOfflineStore, such that the latest version (or snapshot) of the Iceberg table will not contain the records. The following example deletes the records whereis_deletedis'True'and the previous event-time versions of the those records .You may add additional conditions based on other features to restrict the deletion. For more information on usingDELETEwith Athena, seeDELETEin the Athena user guide.DELETE FROMiceberg-table-nameWHERErecord-id-feature-nameIS IN ( SELECTrecord-id-feature-nameFROMiceberg-table-nameWHERE is_deleted = 'True')The soft deleted records are still viewable on previous file versions by performing time travel. For information on performing time travel, see Querying Iceberg table data and performing time travel in the Athena user guide. 
- 
     Remove the record from previous versions of your Iceberg tables to hard delete the record from OfflineStore:- 
       Run the OPTIMIZEcommand to rewrite the data files into a more optimized layout, based on their size and number of associated delete files. For more information on optimizing Iceberg tables and the syntax, see Optimizing Iceberg tables in the Athena user guide.OPTIMIZEiceberg-table-nameREWRITE DATA USING BIN_PACK
- 
       (Optional, only need to run once) Run the ALTER TABLEcommand to alter the Iceberg table set values, and set when previous file versions are to be hard deleted according to your specifications. This can be done by assigning values tovacuum_min_snapshots_to_keepandvacuum_max_snapshot_age_secondsproperties. For more information on altering your Iceberg table set properties, see ALTER TABLE SET PROPERTIES in the Athena user guide. For more information on Iceberg table property key-value pairs, see Table properties in the Athena user guide.ALTER TABLEiceberg-table-nameSET TBLPROPERTIES ( 'vacuum_min_snapshots_to_keep'='your-specified-value', 'vacuum_max_snapshot_age_seconds'='your-specified-value' )
- 
       Run the VACUUMcommand to remove no longer needed data files for your Iceberg tables, not referenced by the current version. TheVACUUMcommand should run after the deleted record is no longer referenced in the current snapshot. For example,vacuum_max_snapshot_age_secondsafter the deletion. For more information onVACUUMwith Athena and the syntax, seeVACUUM.VACUUMiceberg-table-name
 
- 
       
Apache Spark offline store soft and hard delete example
To soft and then hard delete a record from the OfflineStore Iceberg table
    using Apache Spark, you can follow the same instructions as in the Amazon Athena offline store soft
     and hard delete example above, but using Spark
    procedures. For a full list of procedures, see Spark Procedures
- 
     When soft deleting from the OfflineStore: instead of using theDELETEcommand in Athena, use theDELETE FROMcommand in Apache Spark. 
- 
     To remove the record from previous versions of your Iceberg tables to hard delete the record from OfflineStore:- 
       When changing your Iceberg table configuration: instead of using the ALTER TABLEcommand from Athena, useexpire_snapshotsprocedure. 
- 
       To remove no longer needed data files from your Iceberg tables: instead of using the VACUUMcommand in Athena, use theremove_orphan_filesprocedure. 
 
-