Stream the integration response for your proxy integrations in API Gateway - Amazon API Gateway

Stream the integration response for your proxy integrations in API Gateway

You can configure your proxy integration to control how API Gateway returns your integration response. By default, API Gateway waits to receive the complete response before beginning transmission. However, if you set your integration's response transfer mode to STREAM, API Gateway doesn't wait for a response to be completely computed before sending it to the client. Response streaming works for all REST API endpoint types.

Use response streaming for the following use cases:

  • Lower the time-to-first-byte (TTFB) for generative AI applications like chatbots.

  • Stream large image, video, or music files without using an S3 pre-signed URL.

  • Perform long-running operations while reporting incremental progress like server-sent events (SSE).

  • Exceed API Gateway's 10 MB response payload limit.

  • Exceed API Gateway's 29 second timeout limit without requesting an integration timeout limit increase.

  • Receive a binary payload without configuring the binary media types.

Considerations for response payload streaming

The following considerations might impact your use of response payload streaming:

  • You can only use response payload streaming for HTTP_PROXY or AWS_PROXY integration types. This includes Lambda proxy integrations and private integrations that use HTTP_PROXY integrations.

  • The default transfer mode setting is BUFFERED. To use response streaming you must change the response transfer mode to STREAM.

  • Response streaming is only supported for REST APIs.

  • Request streaming isn't supported.

  • You can stream your response for up to 15 minutes.

  • Your streams are subject to idle timeouts. For Regional or private endpoints, the timeout is 5 minutes. For edge-optimized endpoints, the timeout is 30 seconds.

  • If you use response streaming for a Regional REST API with your own CloudFront distribution, you can achieve an idle time out greater than 30 seconds by increasing the response timeout of your CloudFront distribution. For more information, see Response timeout.

  • When the response transfer mode is set to STREAM, API Gateway can’t support features that require buffering the entire integration response. Because of this, the following features aren't supported with response streaming:

    • Endpoint caching

    • Content encoding. If you want to compress your integration response, do this in your integration.

    • Response transformation with VTL

  • Within each streaming response, the first 10MB of response payload is not subject to any bandwidth restrictions. Response payload data exceeding 10MB is restricted to 2MB/s.

  • When the connection between the client and API Gateway or between API Gateway and Lambda is closed due to timeout, the Lambda function might continue to execute. For more information, see Configure Lambda function timeout.

  • Response streaming incurs a cost. For more information, see API Gateway Pricing.