View a markdown version of this page

Getting started - Guidance for an Automotive Data Platform on AWS

Getting started

This section provides step-by-step instructions for deploying the Predictive Maintenance solution, including model training, inference pipeline setup, and alert configuration.

How the Predictive Maintenance Model Works

The solution implements a tire pressure anomaly detection system using a multi-stage machine learning pipeline. Here’s how it works step-by-step:

Step 1: Data Collection and Preparation

The system begins by collecting tire pressure telemetry from your vehicle fleet stored in Amazon Redshift. An AWS Glue ETL job runs hourly to extract new sensor readings and transform them into a standardized format.

What happens:

  • Tire pressure readings are extracted from Redshift (or S3 if using data lake)

  • Data is validated and cleansed to remove sensor errors

  • Readings are normalized and aggregated by vehicle and tire position

  • Processed data is stored in S3 in Parquet format for efficient querying

Key outputs: Hourly batches of clean telemetry data ready for analysis

Step 2: Feature Engineering

Raw pressure readings are transformed into meaningful features that the ML model can learn from. This includes calculating pressure trends, rate of change, and statistical patterns.

What happens:

  • Time-series features are calculated (rolling averages, standard deviations)

  • Pressure drop rates are computed over 6-hour, 12-hour, and 24-hour windows

  • Contextual features are added (temperature, vehicle load, driving conditions)

  • Historical baseline pressures are retrieved for comparison

Key outputs: Feature dataset with 20+ engineered attributes per tire reading

Step 3: Model Training

A Random Cut Forest (RCF) algorithm trains on historical data to learn normal tire pressure patterns. The model identifies what "healthy" tire behavior looks like across different conditions.

What happens:

  • Amazon SageMaker trains an RCF model on 30 days of historical data

  • The model learns normal pressure patterns for different vehicle types and conditions

  • Training runs weekly (configurable) to adapt to seasonal changes

  • Model artifacts are versioned and stored in S3

Key outputs: Trained anomaly detection model that scores tire readings from 0-1 (0=normal, 1=anomalous)

Step 4: Batch Inference

The trained model processes new tire readings daily, generating anomaly scores that indicate the likelihood of a tire issue developing.

What happens:

  • SageMaker Batch Transform runs inference on the latest telemetry data

  • Each tire reading receives an anomaly score

  • Scores above 0.7 trigger alerts for potential issues

  • Predictions include 7-14 day advance warning before likely failure

Key outputs: Daily predictions with anomaly scores and estimated failure dates

Step 5: Filter-Based Validation

A parallel statistical filter validates ML predictions using physics-based rules. This catches rapid pressure drops that might indicate immediate leaks.

What happens:

  • Pressure drop rates are compared against threshold values

  • Leak rates are calculated using time-series regression

  • Alerts are generated for drops exceeding 2 PSI per hour

  • Results are cross-referenced with ML predictions

Key outputs: Validated alerts with both ML and statistical confidence scores

Step 6: Alert Consolidation and Delivery

Alerts from both pipelines are merged, deduplicated, and delivered to your maintenance systems via API or SNS notifications.

What happens:

  • Duplicate alerts are removed (same vehicle/tire from both pipelines)

  • Severity levels are assigned (Critical: >5 PSI drop, Warning: 2-5 PSI drop)

  • Alert state is tracked in DynamoDB to prevent duplicate notifications

  • Alerts are sent to fleet management systems via REST API or email

Key outputs: Actionable maintenance alerts with vehicle ID, tire position, severity, and predicted failure date

Implementation Steps

# Install CDK dependencies
cd deployment
npm install

# Install Python dependencies
pip3 install -r requirements.txt

# Return to project root
cd ..

Configure Environment Variables

# Copy example environment file cp .env.example .env # Edit .env file nano .env

Required environment variables:

# AWS Configuration AWS_ACCOUNT_ID=123456789012 AWS_REGION=us-east-1 AWS_PROFILE=default # Redshift Configuration REDSHIFT_DATASHARE_ARN=arn:aws:redshift:us-east-1:123456789012:datashare:... REDSHIFT_DATABASE=telemetry_db REDSHIFT_SCHEMA=public # S3 Configuration RAW_DATA_BUCKET=mmt-predictive-maintenance-raw ETL_DATA_BUCKET=mmt-predictive-maintenance-etl ML_FEATURES_BUCKET=mmt-predictive-maintenance-ml-features # ML Configuration TRAINING_INSTANCE_TYPE=ml.m5.xlarge INFERENCE_INSTANCE_TYPE=ml.m5.large MODEL_TRAINING_SCHEDULE=cron(0 2 ? * SUN *) # Weekly Sunday 2 AM INFERENCE_SCHEDULE=cron(0 6 * * ? *) # Daily 6 AM # Alerts Configuration ALERT_SNS_EMAIL=fleet-managers@example.com ALERT_API_ENDPOINT=https://relay-garage-system.example.com/api/alerts

Bootstrap CDK (First-Time Only)

# Bootstrap CDK cdk bootstrap aws://ACCOUNT-ID/REGION

Deploy Infrastructure Stacks

# Synthesize CloudFormation templates cdk synth # Deploy all stacks cdk deploy --all # Or deploy stacks individually: cdk deploy DataStack cdk deploy EtlStack cdk deploy MlStack cdk deploy FilteringStack cdk deploy AlertsStack cdk deploy MonitoringStack

Deployment time: 30 minutes

What gets deployed:

  1. DataStack

    • S3 buckets: raw, etl, ml-features, predictions

    • Glue database: mmt_predictive_maintenance

    • DynamoDB table: tire-alerts

  2. EtlStack

    • Lambda: redshift-query-lambda

    • Glue job: root-etl-pipeline

    • CloudWatch Events: Hourly triggers

    • IAM roles: Glue and Lambda execution roles

  3. MlStack

    • Step Functions: ml-etl-pipeline, ml-training-pipeline, ml-inference-pipeline

    • Lambda: Path resolvers, monitoring functions

    • Glue job: ml-feature-engineering

    • SSM Parameter: /mmt/predictive-maintenance/latest-model

  4. FilteringStack

    • Step Function: filtering-pipeline

    • Lambda: filtering-algorithm

    • CloudWatch Events: Daily trigger

  5. AlertsStack

    • Lambda: generate-alerts

    • SNS topic: tire-alert-notifications

    • API Gateway: alerts-api

    • S3 event notifications

  6. MonitoringStack

    • CloudWatch dashboards

    • CloudWatch alarms

    • X-Ray tracing

Verification:

# Check all stacks aws cloudformation list-stacks \ --stack-status-filter CREATE_COMPLETE \ --region us-east-1 \ --query 'StackSummaries[?contains(StackName, `mmt-predictive-maintenance`)].StackName' # Verify S3 buckets aws s3 ls | grep mmt-predictive-maintenance # Verify Glue database aws glue get-database \ --name mmt_predictive_maintenance \ --region us-east-1 # Verify Step Functions aws stepfunctions list-state-machines \ --region us-east-1 \ --query 'stateMachines[?contains(name, `ml`)].name'

Manual Step: Configure Redshift Datashare

Important: This step must be completed manually before the ETL pipeline can run.

Option 1: Redshift Datashare (Recommended)

-- In the source Redshift cluster, create datashare CREATE DATASHARE tire_telemetry_share; -- Add schema to datashare ALTER DATASHARE tire_telemetry_share ADD SCHEMA public; -- Add tables to datashare ALTER DATASHARE tire_telemetry_share ADD TABLE public.tire_telemetry; ALTER DATASHARE tire_telemetry_share ADD TABLE public.vehicle_metadata; -- Grant usage to consumer account GRANT USAGE ON DATASHARE tire_telemetry_share TO ACCOUNT '123456789012';

In the consumer account (where solution is deployed):

-- Create database from datashare CREATE DATABASE tire_telemetry_db FROM DATASHARE tire_telemetry_share OF ACCOUNT '987654321098' NAMESPACE 'source-namespace-guid'; -- Grant permissions to Lambda execution role GRANT USAGE ON DATABASE tire_telemetry_db TO IAM_ROLE 'arn:aws:iam::123456789012:role/mmt-lambda-execution-role'; GRANT SELECT ON ALL TABLES IN SCHEMA public TO IAM_ROLE 'arn:aws:iam::123456789012:role/mmt-lambda-execution-role';

Option 2: S3 Unload (Alternative)

If using S3 unload instead of datashare:

  1. Configure Redshift to UNLOAD data to S3 raw bucket hourly

  2. Remove redshift-query-lambda from deployment

  3. Update root-etl-pipeline Glue job to read from S3 directly

Trigger Initial ETL Run

# Manually trigger the query Lambda aws lambda invoke \ --function-name redshift-query-lambda \ --region us-east-1 \ response.json # Check response cat response.json # Wait 30 minutes, then trigger ETL Glue job aws glue start-job-run \ --job-name root-etl-pipeline \ --region us-east-1 # Monitor job status aws glue get-job-run \ --job-name root-etl-pipeline \ --run-id jr_... \ --region us-east-1 \ --query 'JobRun.JobRunState'

Trigger Initial ML Training

# Start ML ETL pipeline aws stepfunctions start-execution \ --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:ml-etl-pipeline \ --region us-east-1 # Wait for completion (check in console or poll status) # Start ML training pipeline aws stepfunctions start-execution \ --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:ml-training-pipeline \ --region us-east-1 # Monitor training in SageMaker console # Training takes ~30-45 minutes

Test Inference Pipeline

# After training completes, run inference aws stepfunctions start-execution \ --state-machine-arn arn:aws:states:us-east-1:123456789012:stateMachine:ml-inference-pipeline \ --region us-east-1 # Check predictions in S3 aws s3 ls s3://mmt-predictive-maintenance-processed-predictions-$(aws sts get-caller-identity --query Account --output text)/ # Query predictions in DynamoDB aws dynamodb scan \ --table-name tire-alerts \ --region us-east-1 \ --limit 10

Configure Alert Notifications

# Subscribe email to SNS topic aws sns subscribe \ --topic-arn arn:aws:sns:us-east-1:123456789012:tire-alert-notifications \ --protocol email \ --notification-endpoint fleet-manager@example.com \ --region us-east-1 # Confirm subscription via email

CMS Integration: Quick Start

For customers using the Connected Mobility Guidance, follow these steps to connect the tire prediction model to your CMS telemetry pipeline.

Step 1: Generate training data

The solution includes a synthetic data generator that creates realistic tire telemetry with injected anomalies:

cd guidance-for-predictive-maintenance python3 scripts/generate_training_data.py

This creates 721,024 records across 50 vehicles over 6 months, including:

  • Normal driving patterns with seasonal temperature effects

  • Slow leaks (8% of vehicle-tires, 0.3–1.2 PSI/day loss)

  • Punctures (4%, sudden pressure drop)

  • Valve failures (3%, intermittent pressure loss)

  • Overinflation events (2%)

Output: data/training/tire_telemetry_full.parquet (17.5 MB)

Step 2: Train the model

Option A: Command line

python3 scripts/train_model.py \ --region us-east-2 \ --role-arn arn:aws:iam::ACCOUNT:role/cms-sagemaker-execution-role \ --bucket cms-tire-prediction-ACCOUNT-REGION \ --deploy

Option B: SageMaker notebook

Open notebooks/train_tire_model.ipynb in SageMaker Studio or a local Jupyter environment. The notebook provides:

  • Data exploration and visualization (pressure distributions, slow leak examples)

  • Feature preparation and normalization

  • Model training with progress monitoring

  • Evaluation with precision/recall/F1 metrics

  • Anomaly score distribution visualization

  • Endpoint deployment and SSM configuration

Both options train a SageMaker Random Cut Forest model (~3 minutes), deploy a real-time endpoint (~5 minutes), and save configuration to SSM Parameter Store.

Step 3: Deploy CMS integration

Deploy the CDK stack to create the prediction Lambdas and EventBridge schedule:

cd source/infrastructure DEPLOYMENT_STAGE=prod cdk deploy tire-predictive-maintenance-stack

This creates:

  • cms-{stage}-daily-tire-check Lambda — runs daily, detects slow leak trends

  • cms-{stage}-blowout-risk Lambda — real-time highway blowout risk assessment

  • EventBridge schedule (daily at 10 AM UTC)

  • IAM roles with least-privilege permissions

  • S3 bucket for training artifacts

Step 4: Verify end-to-end

Start a simulation in the CMS Fleet Manager UI with the "Tire pressure below safe threshold" maintenance event selected. Within 2 minutes:

  1. The simulator gradually drops tire pressure from 32 PSI toward 20 PSI

  2. The Flink MaintenanceProcessor detects maintenance.tire_pressure when pressure crosses 28 PSI

  3. A maintenance alert appears on the vehicle detail page with a $35 estimated repair cost

  4. The daily tire check Lambda (when run) detects the pressure trend and writes a prediction.tire_slow_leak warning

For highway blowout risk testing, select "Highway blowout risk" which creates a composite condition: tire pressure drops below 30 PSI while vehicle speed exceeds 60 mph. The SageMaker endpoint evaluates the multi-signal risk pattern and writes a prediction.blowout_risk alert.

SSM Parameters

After training, the following parameters are available:

Parameter Description

/tire-prediction/{stage}/normalization-stats

Feature normalization (mean/std per feature)

/tire-prediction/{stage}/anomaly-threshold

Anomaly score threshold for blowout risk detection

/tire-prediction/{stage}/endpoint-name

SageMaker endpoint name for real-time inference