Configure content filters for Amazon Bedrock Guardrails
With Amazon Bedrock Guardrails, you can configure content filters to block model prompts and responses in natural language for text and images containing harmful content. For example, an e-commerce site can design its online assistant to avoid using inappropriate language and or images.
Filter classification and blocking levels
Filtering is done based on confidence classification of user inputs and FM
responses across each of the six categories. All user inputs and FM responses are
classified across four strength levels - NONE, LOW,
MEDIUM, and HIGH. For example, if a statement is
classified as Hate with HIGH confidence, the likelihood of that
statement representing hateful content is high. A single statement can be classified
across multiple categories with varying confidence levels. For example, a single
statement can be classified as Hate with
HIGH confidence, Insults with
LOW confidence, Sexual with
NONE, and Violence with
MEDIUM confidence.
Filter strength
You can configure the strength of the filters for each of the content filter categories. The filter strength determines the sensitivity of filtering harmful content. As the filter strength is increased, the likelihood of filtering harmful content increases and the probability of seeing harmful content in your application decreases.
You have four levels of filter strength
-
None — There are no content filters applied. All user inputs and FM-generated outputs are allowed.
-
Low — The strength of the filter is low. Content classified as harmful with
HIGHconfidence will be filtered out. Content classified as harmful withNONE,LOW, orMEDIUMconfidence will be allowed. -
Medium — Content classified as harmful with
HIGHandMEDIUMconfidence will be filtered out. Content classified as harmful withNONEorLOWconfidence will be allowed. -
High — This represents the strictest filtering configuration. Content classified as harmful with
HIGH,MEDIUMandLOWconfidence will be filtered out. Content deemed harmless will be allowed.
| Filter strength | Blocked content confidence | Allowed content confidence |
|---|---|---|
| None | No filtering | None, Low, Medium, High |
| Low | High | None, Low, Medium |
| Medium | High, Medium | None, Low |
| High | High, Medium, Low | None |