Manage throughput quotas

GENREL01: How do you determine throughput quotas (or needs) for foundation models?

Foundation models perform complex tasks over detailed input, and they have limited throughput on the amount of inference requests they can service at a time. This is particularly true for managed and serverless model hosting paradigms. Understanding and managing these quotas is crucial for maintaining reliable service levels and optimal performance.

Best practices

GENREL01-BP01 Scale and balance foundation model throughput as a function of utilization

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

Reliability

GENREL01-BP01 Scale and balance foundation model throughput as a function of utilization