LSOPS06-BP01 Validate data quality during ingestion
Define data quality measurements and implement checks during ingestion.
Desired outcome: Data is clean and ready for analysis upon ingestion.
Benefits of establishing this best practice:
-
More trusted results.
-
Faster output to conclusions.
Level of risk exposed if this best practice is not established: High
Implementation guidance
Build data quality checks into data ingestion pipelines.
Plan to isolate erroneous data and have a way to review it as needed.
Implementation steps
-
Identify expected patterns in received data.
-
Build data quality checks in AWS Glue Data Quality.
-
Incorporate checks in data pipelines.
-
Build processes to catch and isolate erroneous data. Alert personnel to investigate data errors
Resources
Related examples:
Related tools: