GENCOST03-BP04 Annotate user input to enable cost-aware content filtering

Annotate specific sections of input prompts to selectively apply content filtering and reduce token usage costs. By using input tags to mark only the user-provided content for filtering, you can avoid unnecessary processing of system prompts, search results, and conversation history while maintaining essential safeguards.

Desired outcome: Enable more efficient and cost-effective content filtering by processing only the relevant portions of input that require guardrails evaluation.

Benefits of establishing this best practice:

Control resource consumption parameters - By filtering only selected content rather than entire prompts, you minimize the number of tokens processed by content filters.
Optimize model and inference selection - Selective filtering reduces the volume of text evaluated, leading to faster response times.

Level of risk exposed if this best practice is not established: Medium

Implementation guidance

By implementing selective content filtering through input tags, you can significantly reduce token costs while preserving the effectiveness of your content safeguards. Please note that the input tags are not supported when using ApplyGuardrail API, so you need to implement content filtering on your application side to derive the benefits of input tags.

Review your application architecture to identify where content filtering is needed.
Determine which content sections require filtering or trusted content.
Implement input tagging following the Amazon Bedrock documentation.
Test filtering effectiveness and performance impact.
Monitor costs and adjust tag usage to optimize spend while maintaining safety.

Implementation steps

Use XML-style tags to mark specific sections of input prompts for content filtering. Add tags using the format:


<amazon-bedrock-guardrails-guardContent_xyz>
[Content to be filtered]
</amazon-bedrock-guardrails-guardContent_xyz>

Generate a unique random tag suffix (xyz) for each request to reduce prompt injection attacks. Use alphanumeric characters between 1-20 characters.

Include the tag suffix in the guardrailConfig:


{
    "amazon-bedrock-guardrailConfig": {
        "tagSuffix": "xyz"
    }
}

Apply tags selectively to user queries and input, current conversation turns, and new or unverified content.
Leave system prompts, verified search result, historical conversation context, and other trusted content untagged.
Define a minimalist response scheme (for example, 0 for affirmative and 1 for rejection).
Inform the model in the prompt of the requested model response scheme, and ask the model to respond in kind.
Set a hard limit on the response length by configuring the response length hyperparameter accordingly.
Continue testing and optimizing the model's response to verify it satisfies the workload requirements. Monitor and optimize your implementation by:
- Tracking token usage with and without selective filtering
- Measuring latency impact across different tag configurations
- Verifying filtering effectiveness on tagged vs untagged content
- Adjusting tag placement based on application needs

Example implementation

The following use cases are well-suited for input tagging:

RAG applications: Tag only user queries while leaving retrieved passages unfiltered .
Chat applications: Tag new user messages while preserving conversation history.
Content moderation: Tag user-generated content while allowing verified content to pass through.
Document processing: Tag extracted text portions needing review while trusting source material.

Resources

Related best practices:

COST10-BP01

Related videos:

AWS re:Invent 2023 - Prompt Engineering Best Practices for LLMs on Amazon Bedrock (AIM377)

Related examples:

Amazon Bedrock Prompt Management is now Available in GA

Warning Javascript is disabled or is unavailable in your browser.

To use the Amazon Web Services Documentation, Javascript must be enabled. Please refer to your browser's Help pages for instructions.

Document Conventions

GENCOST03-BP03 Implement prompt caching to reduce token costs

Cost-informed vector stores