Invoke a serverless endpoint
In order to perform inference using a serverless endpoint, you must send an HTTP request to
the endpoint. You can use the InvokeEndpoint API
or the AWS CLI, which make a POST request to invoke your endpoint. The maximum request
and response payload size for serverless invocations is 4 MB. For serverless endpoints:
The model must download and the server must respond successfully to
/pingwithin 3 minutes.The timeout for the container to respond to inference requests to
/invocationsis 1 minute.
To invoke an endpoint
The following example uses the AWS SDK for Python (Boto3)InvokeEndpoint, you must
use SageMaker Runtime Runtime as the client. Specify the following values:
-
For
endpoint_name, use the name of the in-service serverless endpoint you want to invoke. -
For
content_type, specify the MIME type of your input data in the request body (for example,application/json). -
For
payload, use your request payload for inference. Your payload should be in bytes or a file-like object.
runtime = boto3.client("sagemaker-runtime") endpoint_name = "<your-endpoint-name>" content_type = "<request-mime-type>" payload =<your-request-body>response = runtime.invoke_endpoint( EndpointName=endpoint_name, ContentType=content_type, Body=payload )