Set up inference for a custom model

After you create a custom model, you can set up inference using one of the following options:

Purchase Provisioned Throughput – Purchase Provisioned Throughput for your model to set up dedicated compute capacity with guaranteed throughput for consistent performance and lower latency.

For more information about Provisioned Throughput, see Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock. For more information about using custom models with Provisioned Throughput, see Purchase Provisioned Throughput for a custom model.
Deploy custom model for on-demand inference (only Amazon Nova models) – To set up on-demand inference, you deploy the model with a custom model deployment. After you deploy the model, you invoke it using the ARN for the custom model deployment. With on-demand inference, you only pay for what you use and you don't need to set up provisioned compute resources.

For more information about deploying custom models for on-demand inference, see Deploy a custom model for on-demand inference.

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

View details about a custom model

Purchase Provisioned Throughput for a custom model