Include a guardrail with the Converse API
You can use a guardrail to guard conversational apps that you create with the Converse API. For example, if you create a chat app with Converse API, you can use a guardrail to block inappropriate content entered by the user and inappropriate content generated by the model. For information about the Converse API, see Carry out a conversation with the Converse API operations.
Topics
Call the Converse API with guardrails
To use a guardrail, you include configuration information for the guardrail in calls to the Converse or ConverseStream (for streaming responses) operations. Optionally, you can select specific content in the message that you want the guardrail to assess. For information about the models that you can use with guardrails and the Converse API, see Supported models and model features.
Topics
Configure a guardrail to work with the Converse API
You specify guardrai configuration information in the guardrailConfig
input parameter. The configuration includes the ID and the version of the guardrail
that you want to use. You can also enable tracing for the guardrail, which provides
information about the content that the guardrail blocked.
With the Converse
operation, guardrailConfig
is a
GuardrailConfiguration object, as shown in the following
example.
{ "guardrailIdentifier": "
Guardrail ID
", "guardrailVersion": "Guardrail version
", "trace": "enabled" }
If you use ConverseStream
, you pass a GuardrailStreamConfiguration object. Optionally, you can use the
streamProcessingMode
field to specify that you want the model to
complete the guardrail assessment, before returning streaming response chunks. Or,
you can have the model asynchronously respond whilst the guardrail continues its
assessment in the background. For more information, see Configure streaming response behavior to filter content.
Evaluate only specific content in a message
When you pass a Message to a model, your guardrail assesses the content in the message.
You also can asses specific parts of a message by using the
guardContent
(GuardrailConverseContentBlock) field.
Tip
Using the guardContent
field is similar to using input tags with
InvokeModel and InvokeModelWithResponseStream. For more information, see Apply tags to user input to filter content.
For example, the following guardrail evaluates only the content in the
guardContent
field and not the rest of the message. This is useful
for having the guardrail assess only the most recent message in a conversation, as
shown in the following example.
[ { "role": "user", "content": [ { "text": "Create a playlist of 2 pop songs." } ] }, { "role": "assistant", "content": [ { "text": "Sure! Here are two pop songs:\n1. \"Bad Habits\" by Ed Sheeran\n2. \"All Of The Lights\" by Kanye West\n\nWould you like to add any more songs to this playlist?" } ] }, { "role": "user", "content": [ { "guardContent": { "text": { "text": "Create a playlist of 2 heavy metal songs." } } } ] } ]
Another use case of guardContent
is providing additional context for
a message without your guardrail assessing that context. In the following example,
the guardrail only assesses "Create a playlist of heavy metal songs"
and ignores the "Only answer with a list of songs"
.
messages = [ { "role": "user", "content": [ { "text": "Only answer with a list of songs." }, { "guardContent": { "text": { "text": "Create a playlist of heavy metal songs." } } } ] } ]
If content isn't in a guardContent
block, that doesn't necessarily mean
it won't be evaluated. This behavior depends on what filtering polices the guardrail
uses.
The following example shows two guardContent
blocks with contextual grounding checks
(based on the qualifiers
fields). The contextual grounding checks in the
guardrail will only evaluate the content in these blocks. However, if the guardrail also
has a word filter that blocks the word
"background", the text "Some additional background information." will still be
evaluated, even though it's not in a guardContent
block.
[{ "role": "user", "content": [{ "guardContent": { "text": { "text": "London is the capital of UK. Tokyo is the capital of Japan.", "qualifiers": ["grounding_source"] } } }, { "text": "Some additional background information." }, { "guardContent": { "text": { "text": "What is the capital of Japan?", "qualifiers": ["query"] } } } ] }]
Guarding a system prompt sent to the Converse API
You can use guardrails with system prompts that you send to the Converse API. To guard a system prompt, specify the
guardContent
(SystemContentBlock) field in the system prompt that you pass to the API, as shown
in the following example.
[ { "guardContent": { "text": { "text": "Only respond with Welsh heavy metal songs." } } } ]
If you don't provide the guardContent
field, the guardrail doesn't assess the system prompt message.
Message and system prompt guardrail behavior
How the guardrail assesses guardContent
field behaves differently between system prompts and messages that you pass in the message.
System prompt has guardrail block | System prompt doesn't have guardrail block | |
---|---|---|
Messages have guardrail block |
System: Guardrail investigates content in guardrail block Messages: Guardrail investigates content in guardrail block |
System: Guardrail investigates nothing Messages: Guardrail investigates content in guardrail block |
Messages don't have guardrail block |
System: Guardrail investigates content in guardrail block Messages: Guardrail investigates everything |
System: Guardrail investigates nothing Messages: Guardrail investigates everything |
Processing the response when using the Converse API
When you call the Converse operation, the guardrail assesses the message that you send. If the guardrail detects blocked content, the following happens.
The
stopReason
field in the response is set toguardrail_intervened
.-
If you enabled tracing, the trace is available in the
trace
(ConverseTrace) Field. WithConverseStream
, the trace is in the metadata (ConverseStreamMetadataEvent) that operation returns. -
The blocked content text that you have configured in the guardrail is returned in the
output
(ConverseOutput) field. WithConverseStream
the blocked content text is in the streamed message.
The following partial response shows the blocked content text and the trace from the guardrail assessment. The guardrail has blocked the term Heavy metal in the message.
{ "output": { "message": { "role": "assistant", "content": [ { "text": "Sorry, I can't answer questions about heavy metal music." } ] } }, "stopReason": "guardrail_intervened", "usage": { "inputTokens": 0, "outputTokens": 0, "totalTokens": 0 }, "metrics": { "latencyMs": 721 }, "trace": { "guardrail": { "inputAssessment": { "3o06191495ze": { "topicPolicy": { "topics": [ { "name": "Heavy metal", "type": "DENY", "action": "BLOCKED" } ] }, "invocationMetrics": { "guardrailProcessingLatency": 240, "usage": { "topicPolicyUnits": 1, "contentPolicyUnits": 0, "wordPolicyUnits": 0, "sensitiveInformationPolicyUnits": 0, "sensitiveInformationPolicyFreeUnits": 0, "contextualGroundingPolicyUnits": 0 }, "guardrailCoverage": { "textCharacters": { "guarded": 39, "total": 72 } } } } } } } }
Code example for using Converse API with guardrails
This example shows how to guard a conversation with the Converse
and
ConverseStream
operations. The example shows
how to prevent a model from creating a playlist that includes songs from the heavy metal genre.
To guard a conversation
-
Create a guardrail by following the instructions at Create your guardrail .
-
Name – Enter Heavy metal.
-
Definition for topic – Enter Avoid mentioning songs that are from the heavy metal genre of music.
-
Add sample phrases – Enter Create a playlist of heavy metal songs.
In step 9, enter the following:
-
Messaging shown for blocked prompts – Enter Sorry, I can't answer questions about heavy metal music.
-
Messaging for blocked responses – Enter Sorry, the model generated an answer that mentioned heavy metal music.
You can configure other guardrail options, but it is not required for this example.
-
-
Create a version of the guardrail by following the instructions at Create a version of a guardrail.
-
In the following code examples (Converse and ConverseStream), set the following variables:
guardrail_id
– The ID of the guardrail that you created in step 1.guardrail_version
– The version of the guardrail that you created in step 2.text
– UseCreate a playlist of heavy metal songs.
-
Run the code examples. The output should should display the guardrail assessment and the output message
Text: Sorry, I can't answer questions about heavy metal music.
. The guardrail input assessment shows that the model detected the term heavy metal in the input message. -
(Optional) Test that the guardrail blocks inappropriate text that the model generates by changing the value of
text
to List all genres of rock music.. Run the examples again. You should see an output assessment in the response.