Invoke a serverless endpoint
In order to perform inference using a serverless endpoint, you must send an HTTP request to
   the endpoint. You can use the InvokeEndpoint API
   or the AWS CLI, which make a POST request to invoke your endpoint. The maximum request
   and response payload size for serverless invocations is 4 MB. For serverless endpoints:
- The model must download and the server must respond successfully to - /pingwithin 3 minutes.
- The timeout for the container to respond to inference requests to - /invocationsis 1 minute.
To invoke an endpoint
The following example uses the AWS SDK for Python (Boto3)InvokeEndpoint, you must
    use SageMaker Runtime Runtime as the client. Specify the following values:
- 
     For endpoint_name, use the name of the in-service serverless endpoint you want to invoke.
- 
     For content_type, specify the MIME type of your input data in the request body (for example,application/json).
- 
     For payload, use your request payload for inference. Your payload should be in bytes or a file-like object.
runtime = boto3.client("sagemaker-runtime") endpoint_name = "<your-endpoint-name>" content_type = "<request-mime-type>" payload =<your-request-body>response = runtime.invoke_endpoint( EndpointName=endpoint_name, ContentType=content_type, Body=payload )