Write deterministic code Design for idempotency Manage state efficiently Design effective steps Use wait operations efficiently Additional considerations Additional resources

Best practices for Lambda durable functions

Durable functions use a replay-based execution model that requires different patterns than traditional Lambda functions. Follow these best practices to build reliable, cost-effective workflows.

Write deterministic code

During replay, your function runs from the beginning and must follow the same execution path as the original run. Code outside durable operations must be deterministic, producing the same results given the same inputs.

Wrap non-deterministic operations in steps:

Random number generation and UUIDs
Current time or timestamps
External API calls and database queries
File system operations

Important

Don't use global variables or closures to share state between steps. Pass data through return values. Global state breaks during replay because steps return cached results but global variables reset.

Avoid closure mutations: Variables captured in closures can lose mutations during replay. Steps return cached results, but variable updates outside the step aren't replayed.

TypeScript



// ❌ WRONG: Mutations lost on replay
export const handler = withDurableExecution(async (event, context) => {
  let total = 0;
  
  for (const item of items) {
    await context.step(async () => {
      total += item.price; // ⚠️ Mutation lost on replay!
      return saveItem(item);
    });
  }
  
  return { total }; // Inconsistent value!
});

// ✅ CORRECT: Accumulate with return values
export const handler = withDurableExecution(async (event, context) => {
  let total = 0;
  
  for (const item of items) {
    total = await context.step(async () => {
      const newTotal = total + item.price;
      await saveItem(item);
      return newTotal; // Return updated value
    });
  }
  
  return { total }; // Consistent!
});

// ✅ EVEN BETTER: Use map for parallel processing
export const handler = withDurableExecution(async (event, context) => {
  const results = await context.map(
    items,
    async (ctx, item) => {
      await ctx.step(async () => saveItem(item));
      return item.price;
    }
  );
  
  const total = results.getResults().reduce((sum, price) => sum + price, 0);
  return { total };
});

Python



# ❌ WRONG: Mutations lost on replay
@durable_execution
def handler(event, context: DurableContext):
    total = 0
    
    for item in items:
        context.step(
            lambda _: save_item_and_mutate(item, total),  # ⚠️ Mutation lost on replay!
            name=f'save-item-{item["id"]}'
        )
    
    return {'total': total}  # Inconsistent value!

# ✅ CORRECT: Accumulate with return values
@durable_execution
def handler(event, context: DurableContext):
    total = 0
    
    for item in items:
        total = context.step(
            lambda _: save_item_and_return_total(item, total),
            name=f'save-item-{item["id"]}'
        )
    
    return {'total': total}  # Consistent!

# ✅ EVEN BETTER: Use map for parallel processing
@durable_execution
def handler(event, context: DurableContext):
    def process_item(ctx, item):
        ctx.step(lambda _: save_item(item))
        return item['price']
    
    results = context.map(items, process_item)
    total = sum(results.get_results())
    
    return {'total': total}

Design for idempotency

Operations may execute multiple times due to retries or replay. Non-idempotent operations cause duplicate side effects like charging customers twice or sending multiple emails.

Use idempotency tokens: Generate tokens inside steps and include them with external API calls to prevent duplicate operations.

Use at-most-once semantics: For critical operations that must never duplicate (financial transactions, inventory deductions), configure at-most-once execution mode.

Database idempotency: Use check-before-write patterns, conditional updates, or upsert operations to prevent duplicate records.

Manage state efficiently

Every checkpoint saves state to persistent storage. Large state objects increase costs, slow checkpointing, and impact performance. Store only essential workflow coordination data.

Keep state minimal:

Store IDs and references, not full objects
Fetch detailed data within steps as needed
Use Amazon S3 or DynamoDB for large data, pass references in state
Avoid passing large payloads between steps

Design effective steps

Steps are the fundamental unit of work in durable functions. Well-designed steps make workflows easier to understand, debug, and maintain.

Step design principles:

Use descriptive names - Names like validate-order instead of step1 make logs and errors easier to understand
Keep names static - Don't use dynamic names with timestamps or random values. Step names must be deterministic for replay
Balance granularity - Break complex operations into focused steps, but avoid excessive tiny steps that increase checkpoint overhead
Group related operations - Operations that should succeed or fail together belong in the same step

Use wait operations efficiently

Wait operations suspend execution without consuming resources or incurring costs. Use them instead of keeping Lambda running.

Time-based waits: Use context.wait() for delays instead of setTimeout or sleep.

External callbacks: Use context.waitForCallback() when waiting for external systems. Always set timeouts to prevent indefinite waits.

Polling: Use context.waitForCondition() with exponential backoff to poll external services without overwhelming them.

Additional considerations

Error handling: Retry transient failures like network timeouts and rate limits. Don't retry permanent failures like invalid input or authentication errors. Configure retry strategies with appropriate max attempts and backoff rates. For detailed examples, see Error handling and retries.

Performance: Minimize checkpoint size by storing references instead of full payloads. Use context.parallel() and context.map() to execute independent operations concurrently. Batch related operations to reduce checkpoint overhead.

Versioning: Invoke functions with version numbers or aliases to pin executions to specific code versions. Ensure new code versions can handle state from older versions. Don't rename steps or change their behavior in ways that break replay.

Serialization: Use JSON-compatible types for operation inputs and results. Convert dates to ISO strings and custom objects to plain objects before passing them to durable operations.

Monitoring: Enable structured logging with execution IDs and step names. Set up CloudWatch alarms for error rates and execution duration. Use tracing to identify bottlenecks. For detailed guidance, see Monitoring and debugging.

Testing: Test happy path, error handling, and replay behavior. Test timeout scenarios for callbacks and waits. Use local testing to reduce iteration time. For detailed guidance, see Testing durable functions.

Common mistakes to avoid: Don't nest context.step() calls, use child contexts instead. Wrap non-deterministic operations in steps. Always set timeouts for callbacks. Balance step granularity with checkpoint overhead. Store references instead of large objects in state.

Additional resources

Python SDK documentation - Complete API reference, testing patterns, and advanced examples
TypeScript SDK documentation - Complete API reference, testing patterns, and advanced examples

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Monitoring durable functions

Lambda Managed Instances