Invoke your imported model
The model import job can take several minutes to import your model after you send CreateModelImportJob
request. You can check the status of your import job in the console or by calling the
GetModelImportJob operation and checking the Status field in the response.
The import job is complete if the Status for the model is Complete.
After your imported model is available in Amazon Bedrock, you can use the model with on demand throughput by sending InvokeModel or InvokeModelWithResponseStream requests to make inference calls to the model. For more information, see Submit a single prompt with InvokeModel.
To interface with your imported model using the messages format, you can call the Converse or ConverseStream operations. For more information, see Using the Converse API.
Note
Converse API is not supported for Qwen2.5, Qwen2-VL, Qwen2.5-VL, and GPT-OSS models.
Enhanced API Support: Multiple API Formats
Starting November 17, 2025, Amazon Bedrock Custom Model Import supports comprehensive OpenAI-compatible API formats, providing flexibility in how you integrate and deploy your custom models. All models imported after November 11, 2025, will automatically benefit from these enhanced capabilities with no additional configuration required.
Custom Model Import now supports three API formats:
BedrockCompletion (Text) - Compatible with current Bedrock workflows
OpenAICompletion (Text) - OpenAI Completions Schema compatibility
OpenAIChatCompletion (Text and Images) - Full conversational Schema compatibility
These enhanced capabilities include structured outputs for enforcing JSON schemas and patterns, enhanced vision support with multi-image processing, log probabilities for model confidence insights, and tool calling capabilities for GPT-OSS models.
For detailed API reference documentation, see the official OpenAI documentation:
Completion: OpenAI Completions API
ChatCompletion: OpenAI Chat API
API Format Examples
The following examples demonstrate how to use each of the four supported API formats with your imported models.
You'll need the model ARN to make inference calls to your newly imported model. After the successful completion of the import job and after your imported model is active, you can get the model ARN of your imported model in the console or by sending a ListImportedModels request.
When you invoke your imported model using InvokeModel or InvokeModelWithStream,
your request is served within 5 minutes or you might get ModelNotReadyException.
To understand the ModelNotReadyException, follow the steps in this next section for handling ModelNotreadyException.
Frequently Asked Questions
Q: What API format should I use?
A: For maximum compatibility with various SDKs, we recommend using OpenAICompletion or OpenAIChatCompletion formats as they provide OpenAI-compatible schemas that are widely supported across different tools and libraries.
Q: Does GPT-OSS on Amazon Bedrock Custom Model Import support the Converse API?
A: No. GPT-OSS based custom model import models do not support the Converse API or ConverseStream API. You must use the InvokeModel API with OpenAI-compatible schemas when working with GPT-OSS based custom models.
Q: What models support tool calling?
A: GPT-OSS based custom models support tool calling capabilities. Tool calling enables function calling for complex workflows.
Q: What about models imported before November 11, 2025?
A: Models imported before November 11, 2025, continue to work as is with their existing API formats and capabilities.
Q: What about generation_config.json for OpenAI-based models?
A: It is critical that you include the correct generation_config.json file when importing OpenAI-based models such as GPT-OSS. You must use the updated configuration file (updated August 13, 2024) available at https://huggingface.co/openai/gpt-oss-20b/blob/main/generation_config.json[200002, 199999, 200012]), whereas older versions only included two tokens ([200002, 199999]). Using an outdated generation_config.json file will cause runtime errors during model invocation. This file is essential for proper model behavior and must be included with your OpenAI-based model imports.
Handling ModelNotReadyException
Amazon Bedrock Custom Model Import optimizes the hardware utilization by removing the models that are not active. If you try to invoke
a model that has been removed, you'll get a ModelNotReadyException. After the model is removed and you invoke the model for the first time, Custom Model Import
starts to restore the model. The restoration time depends on the on-demand fleet size and the model size.
If your InvokeModel or InvokeModelWithStream request returns ModelNotReadyException,
follow the steps to handle the exception.
-
Configure retries
By default, the request is automatically retried with exponential backoff. You can configure the maximum number of retries.
The following example shows how to configure the retry. Replace
${region-name},${model-arn}, and10with your Region, model ARN, and maximum attempts.import json import boto3 from botocore.config import Config REGION_NAME =${region-name}MODEL_ID= '${model-arn}' config = Config( retries={ 'total_max_attempts':10, //customizable 'mode': 'standard' } ) message = "Hello" session = boto3.session.Session() br_runtime = session.client(service_name = 'bedrock-runtime', region_name=REGION_NAME, config=config) try: invoke_response = br_runtime.invoke_model(modelId=MODEL_ID, body=json.dumps({'prompt': message}), accept="application/json", contentType="application/json") invoke_response["body"] = json.loads(invoke_response["body"].read().decode("utf-8")) print(json.dumps(invoke_response, indent=4)) except Exception as e: print(e) print(e.__repr__()) -
Monitor response codes during retry attempts
Each retry attempt starts model restoration process. The restoration time depends on the availability of the on-demand fleet and the model size. Monitor the response codes while the restoration process is going on.
If the retries are consistently failing, continue with the next steps.
-
Verify model was successfully imported
You can verify if the model was successfully imported by checking the status of your import job in the console or by calling the GetModelImportJob operation. Check the
Statusfield in the response. The import job is successful if the Status for the model is Complete. -
Contact Support for further investigation
Open a ticket with Support For more information, see Creating support cases.
Include relevant details such as model ID and timestamps in the support ticket.