Configure content filters for Amazon Bedrock Guardrails
With Amazon Bedrock Guardrails, you can configure content filters to block model prompts and responses in natural language for text and images containing harmful content. For example, an e-commerce site can design its online assistant to avoid using inappropriate language and or images.
Filter classification and blocking levels
Filtering is done based on confidence classification of user inputs and FM
responses across each of the six categories. All user inputs and FM responses are
classified across four strength levels - NONE
, LOW
,
MEDIUM
, and HIGH
. For example, if a statement is
classified as Hate with HIGH
confidence, the likelihood of that
statement representing hateful content is high. A single statement can be classified
across multiple categories with varying confidence levels. For example, a single
statement can be classified as Hate with
HIGH
confidence, Insults with
LOW
confidence, Sexual with
NONE
, and Violence with
MEDIUM
confidence.
Filter strength
You can configure the strength of the filters for each of the content filter categories. The filter strength determines the sensitivity of filtering harmful content. As the filter strength is increased, the likelihood of filtering harmful content increases and the probability of seeing harmful content in your application decreases.
You have four levels of filter strength
-
None — There are no content filters applied. All user inputs and FM-generated outputs are allowed.
-
Low — The strength of the filter is low. Content classified as harmful with
HIGH
confidence will be filtered out. Content classified as harmful withNONE
,LOW
, orMEDIUM
confidence will be allowed. -
Medium — Content classified as harmful with
HIGH
andMEDIUM
confidence will be filtered out. Content classified as harmful withNONE
orLOW
confidence will be allowed. -
High — This represents the strictest filtering configuration. Content classified as harmful with
HIGH
,MEDIUM
andLOW
confidence will be filtered out. Content deemed harmless will be allowed.
Filter strength | Blocked content confidence | Allowed content confidence |
---|---|---|
None | No filtering | None, Low, Medium, High |
Low | High | None, Low, Medium |
Medium | High, Medium | None, Low |
High | High, Medium, Low | None |