

# Using Amazon ElastiCache for Valkey for semantic caching
<a name="semantic-caching"></a>

Large language models (LLMs) are the foundation for generative AI and agentic AI applications that power use cases from chatbots and search assistants to code generation tools and recommendation engines. As the use of AI applications in production grows, customers seek ways to optimize cost and performance. Most AI applications invoke the LLM for every user query, even when queries are repeated or semantically similar. Semantic caching is a method to reduce cost and latency in generative AI applications by reusing responses for identical or semantically similar requests using vector embeddings.

This topic explains how to implement a semantic cache using vector search on Amazon ElastiCache for Valkey, including the concepts, architecture, implementation, benchmarks, and best practices.

**Topics**
+ [Overview of semantic caching](semantic-caching-overview.md)
+ [Why ElastiCache for Valkey for semantic caching](semantic-caching-why-elasticache.md)
+ [Solution architecture](semantic-caching-architecture.md)
+ [Prerequisites](semantic-caching-prerequisites.md)
+ [Implementing a semantic cache with ElastiCache for Valkey](semantic-caching-implementation.md)
+ [Impact and benchmarks](semantic-caching-benchmarks.md)
+ [Multi-turn conversation caching](semantic-caching-multi-turn.md)
+ [Best practices](semantic-caching-best-practices.md)
+ [Related resources](semantic-caching-related-resources.md)

# Overview of semantic caching
<a name="semantic-caching-overview"></a>

Unlike traditional caches that rely on exact string matches, a semantic cache retrieves data based on semantic similarity. A semantic cache uses vector embeddings produced by models like Amazon Titan Text Embeddings to capture semantic meaning in a high-dimensional vector space.

In generative AI applications, a semantic cache stores vector representations of queries and their corresponding responses. The system compares the vector embedding of each new query against cached vectors of prior queries to determine if a similar query has been answered before. If the cache contains a similar query above a configured similarity threshold, the system returns the previously generated response instead of invoking the LLM. Otherwise, the system invokes the LLM to generate a response and caches the query embedding and response together for future reuse.

## Why semantic, not exact match?
<a name="semantic-caching-why-semantic"></a>

Consider an IT help chatbot where thousands of users ask the same question. The following queries are different strings but carry the same meaning:
+ "How do I install the VPN app on my laptop?"
+ "Can you guide me through setting up the company VPN?"
+ "Steps to get VPN working on my computer"

An exact-match cache treats each query as unique and invokes the LLM three times. A semantic cache recognizes these queries as semantically equivalent and returns the cached response for all three, invoking the LLM only once.

## Key benefits
<a name="semantic-caching-benefits"></a>

Semantic caching provides the following benefits for generative AI and agentic AI applications:
+ **Reduced costs** – Reusing answers for similar questions reduces the number of LLM calls and overall inference spend. In benchmarks, semantic caching reduced LLM inference cost by up to 86%.
+ **Lower latency** – Serving answers from the cache provides faster responses than running LLM inference. Cache hits return responses in milliseconds rather than seconds, achieving up to 88% latency reduction.
+ **Improved scalability** – Reducing LLM calls for similar or repeated queries enables you to serve more requests within the same model throughput limits without increasing capacity.
+ **Improved consistency** – Using the same cached response for semantically similar requests helps deliver a consistent answer for the same underlying question.

## Where semantic caching is effective
<a name="semantic-caching-effective-use-cases"></a>

Semantic caching is particularly effective for the following types of applications:


| Application type | Description | Example | 
| --- | --- | --- | 
| RAG-based assistants and copilots | Many queries are duplicate requests from different users against a shared knowledge base | IT help chatbot, product FAQ bot, documentation assistant | 
| Agentic AI applications | Agents break tasks into multiple small steps that may repeatedly look up similar information | Compliance agent reusing policy lookups, research agent reusing prior findings | 
| Multimodal applications | Matching similar audio segments, images, or video queries | Automated phone systems reusing guidance for repeated requests like store hours | 

# Why ElastiCache for Valkey for semantic caching
<a name="semantic-caching-why-elasticache"></a>

Semantic caching workloads continuously write, search, and evict cache entries to serve the stream of incoming user queries while keeping responses fresh. The cache store must meet the following requirements:
+ **Real-time vector updates** – New queries and responses must be immediately available in the cache to maintain hit rates.
+ **Low-latency lookups** – The cache sits in the online request path of every query, so lookups must not add perceptible delay to end-user response time.
+ **Efficient ephemeral management** – Entries are frequently written, read, and evicted, requiring efficient management of a hot set.

ElastiCache for Valkey meets these requirements:
+ **Lowest latency vector search** – At the time of writing, ElastiCache for Valkey delivers the lowest latency vector search with the highest throughput and best price-performance at 95%\$1 recall rate among popular vector databases on AWS. Latency is as low as microseconds with up to 99% recall.
+ **Multithreaded architecture** – Vector search on ElastiCache uses a multithreaded architecture that supports real-time vector updates and high write throughput while maintaining low latency for search requests.
+ **Built-in cache features** – TTL (time to live), eviction policies (`allkeys-lru`), and atomic operations help manage the ephemeral hot set of entries that semantic caching creates.
+ **Vector index support** – ElastiCache supports both HNSW (Hierarchical Navigable Small World) and FLAT index algorithms with COSINE, Euclidean, and inner product distance metrics.
+ **Zero-downtime scalability** – ElastiCache supports scaling without downtime, allowing you to adjust capacity as your cache grows.
+ **Framework integration** – ElastiCache for Valkey integrates with Amazon Bedrock AgentCore through the LangGraph framework, enabling you to implement a Valkey-backed semantic cache for agents built on Amazon Bedrock.

# Solution architecture
<a name="semantic-caching-architecture"></a>

The following architecture implements a read-through semantic cache for an agent on Amazon Bedrock AgentCore. A request follows one of two paths:
+ **Cache hit** – If ElastiCache finds a prior query above the configured similarity threshold, AgentCore returns the cached answer immediately. This path invokes only the embedding model and does not require LLM inference. This path has millisecond-level end-to-end latency and does not incur LLM inference cost.
+ **Cache miss** – If no similar prior query is found, AgentCore invokes the LLM to generate a new answer and returns it to the user. The application then caches the prompt's embedding and answer in ElastiCache so that future similar prompts can be served from the cache.

# Prerequisites
<a name="semantic-caching-prerequisites"></a>

To implement semantic caching with ElastiCache for Valkey, you need:

1. An AWS account with access to Amazon Bedrock, including Amazon Bedrock AgentCore Runtime, Amazon Titan Text Embeddings v2 model, and an LLM such as Amazon Nova Premier enabled in the US East (N. Virginia) Region.

1. The AWS Command Line Interface (AWS CLI) configured with Python 3.11 or later.

1. An Amazon Elastic Compute Cloud instance inside your Amazon VPC with the following packages installed:

   ```
   pip install numpy pandas valkey bedrock-agentcore \
               langchain-aws 'langgraph-checkpoint-aws[valkey]'
   ```

1. An ElastiCache for Valkey cluster running version 8.2 or later, which supports vector search. For instructions on creating a cluster, see [Creating a cluster for Valkey or Redis OSS](Clusters.Create.md).

# Implementing a semantic cache with ElastiCache for Valkey
<a name="semantic-caching-implementation"></a>

The following walkthrough shows how to implement a read-through semantic cache using ElastiCache for Valkey with Amazon Bedrock.

## Step 1: Create an ElastiCache for Valkey cluster
<a name="semantic-caching-step1"></a>

Create an ElastiCache for Valkey cluster with version 8.2 or later using the AWS CLI:

```
aws elasticache create-replication-group \
  --replication-group-id "valkey-semantic-cache" \
  --cache-node-type cache.r7g.large \
  --engine valkey \
  --engine-version 8.2 \
  --num-node-groups 1 \
  --replicas-per-node-group 1
```

## Step 2: Connect to the cluster and configure embeddings
<a name="semantic-caching-step2"></a>

From your application code running on your Amazon EC2 instance, connect to the ElastiCache cluster and set up the embedding model:

```
from valkey.cluster import ValkeyCluster
from langchain_aws import BedrockEmbeddings

# Connect to ElastiCache for Valkey
valkey_client = ValkeyCluster(
    host="mycluster.xxxxxx.clustercfg.use1.cache.amazonaws.com",  # Your cluster endpoint
    port=6379,
    decode_responses=False
)

# Set up Amazon Bedrock Titan embeddings
embeddings = BedrockEmbeddings(
    model_id="amazon.titan-embed-text-v2:0",
    region_name="us-east-1"
)
```

Replace the host value with your ElastiCache cluster's configuration endpoint. For instructions on finding your cluster endpoint, see [Accessing your ElastiCache cluster](accessing-elasticache.md).

## Step 3: Create the vector index for the semantic cache
<a name="semantic-caching-step3"></a>

Configure a ValkeyStore that automatically embeds queries using an HNSW index with COSINE distance for vector search:

```
from langgraph_checkpoint_aws import ValkeyStore
from hashlib import md5

store = ValkeyStore(
    client=valkey_client,
    index={
        "collection_name": "semantic_cache",
        "embed": embeddings,
        "fields": ["query"],           # Fields to vectorize
        "index_type": "HNSW",          # Vector search algorithm
        "distance_metric": "COSINE",   # Similarity metric
        "dims": 1024                   # Titan V2 produces 1024-d vectors
    }
)
store.setup()

def cache_key_for_query(query: str):
    """Generate a deterministic cache key for a query."""
    return md5(query.encode("utf-8")).hexdigest()
```

**Note**  
ElastiCache for Valkey uses an index to provide fast and accurate vector search. The `FT.CREATE` command creates the underlying index. For more information, see [Vector search for ElastiCache](vector-search.md).

## Step 4: Implement cache search and update functions
<a name="semantic-caching-step4"></a>

Create functions to search the cache for semantically similar queries and to store new query-response pairs:

```
def search_cache(user_message: str, k: int = 3, min_similarity: float = 0.8):
    """Look up a semantically similar cached response from ElastiCache."""
    hits = store.search(
        namespace="semantic-cache",
        query=user_message,
        limit=k
    )
    if not hits:
        return None

    # Sort by similarity score (highest first)
    hits = sorted(hits, key=lambda h: h["score"], reverse=True)
    top_hit = hits[0]
    score = top_hit["score"]

    if score < min_similarity:
        return None  # Below similarity threshold

    return top_hit["value"]["answer"]  # Return cached answer


def store_cache(user_message: str, result_message: str):
    """Store a new query-response pair in the semantic cache."""
    key = cache_key_for_query(user_message)
    store.put(
        namespace="semantic-cache",
        key=key,
        value={
            "query": user_message,
            "answer": result_message
        }
    )
```

## Step 5: Implement the read-through cache pattern
<a name="semantic-caching-step5"></a>

Integrate the cache into your application's request handling:

```
import time

def handle_query(user_message: str) -> dict:
    """Handle a user query with read-through semantic cache."""
    start = time.time()

    # Step 1: Search the semantic cache
    cached_response = search_cache(user_message, min_similarity=0.8)

    if cached_response:
        # Cache hit - return cached response
        elapsed = (time.time() - start) * 1000
        return {
            "response": cached_response,
            "source": "cache",
            "latency_ms": round(elapsed, 1),
        }

    # Step 2: Cache miss - invoke LLM
    llm_response = invoke_llm(user_message)  # Your LLM invocation function

    # Step 3: Store the response in cache for future reuse
    store_cache(user_message, llm_response)

    elapsed = (time.time() - start) * 1000
    return {
        "response": llm_response,
        "source": "llm",
        "latency_ms": round(elapsed, 1),
    }
```

## Underlying Valkey commands
<a name="semantic-caching-valkey-commands"></a>

The following table shows the Valkey commands used to implement the semantic cache:


| Operation | Valkey command | Typical latency | 
| --- | --- | --- | 
| Create index | FT.CREATE semantic\$1cache SCHEMA query TEXT answer TEXT embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1024 DISTANCE\$1METRIC COSINE | One-time setup | 
| Cache lookup | FT.SEARCH semantic\$1cache "\$1=>[KNN 3 @embedding \$1query\$1vec]" PARAMS 2 query\$1vec [bytes] DIALECT 2 | Microseconds | 
| Store response | HSET cache:\$1hash\$1 query "..." answer "..." embedding [bytes] | Microseconds | 
| Set TTL | EXPIRE cache:\$1hash\$1 82800 | Microseconds | 
| LLM inference (miss) | External API call to Amazon Bedrock | 500–6000 ms | 

# Impact and benchmarks
<a name="semantic-caching-benchmarks"></a>

AWS evaluated the approach on 63,796 real user chatbot queries and their paraphrased variants from the public SemBenchmarkLmArena dataset. This dataset captures user interactions with the Chatbot Arena platform across general assistant use cases such as question answering, writing, and analysis.

The evaluation used the following configuration:
+ ElastiCache `cache.r7g.large` instance as the semantic cache store
+ Amazon Titan Text Embeddings V2 for embeddings
+ Claude 3 Haiku for LLM inference

The cache was started empty, and all 63,796 queries were streamed as random incoming user traffic, simulating real-world application traffic.

## Cost and accuracy at different similarity thresholds
<a name="semantic-caching-cost-accuracy"></a>

The following table summarizes the trade-off between cost reduction, latency improvement, and accuracy across different similarity thresholds:


| Similarity threshold | Cache hit ratio | Accuracy of cached responses | Total daily cost | Cost savings | Average latency (s) | Latency reduction | 
| --- | --- | --- | --- | --- | --- | --- | 
| Baseline (no cache) | – | – | \$149.50 | – | 4.35 | – | 
| 0.99 (very strict) | 23.5% | 92.1% | \$141.70 | 15.8% | 3.60 | 17.1% | 
| 0.95 (strict) | 56.0% | 92.6% | \$123.80 | 51.9% | 1.84 | 57.7% | 
| 0.90 (moderate) | 74.5% | 92.3% | \$113.60 | 72.5% | 1.21 | 72.2% | 
| 0.80 (balanced) | 87.6% | 91.8% | \$17.60 | 84.6% | 0.60 | 86.1% | 
| 0.75 (relaxed) | 90.3% | 91.2% | \$16.80 | 86.3% | 0.51 | 88.3% | 
| 0.50 (very relaxed) | 94.3% | 87.5% | \$15.90 | 88.0% | 0.46 | 89.3% | 

At a similarity threshold of 0.75, semantic caching reduced LLM inference cost by up to 86% while maintaining 91% answer accuracy. The choice of LLM, embedding model, and backing store affects both cost and latency. Semantic caching delivers proportionally larger benefits when used with bigger, higher-cost LLMs.

## Individual query latency improvements
<a name="semantic-caching-latency-improvements"></a>

The following table shows the impact on individual query latency. A cache hit reduced latency by up to 59x, from multiple seconds to a few hundred milliseconds:


| Query intent | Cache miss latency | Cache hit latency | Reduction | 
| --- | --- | --- | --- | 
| "Are there instances where SI prefixes deviate from denoting powers of 10, excluding their application?" → paraphrased variant | 6.51 s | 0.11 s | 59x | 
| "Sally is a girl with 3 brothers, and each of her brothers has 2 sisters. How many sisters are there in Sally's family?" → paraphrased variant | 1.64 s | 0.13 s | 12x | 

# Multi-turn conversation caching
<a name="semantic-caching-multi-turn"></a>

For applications with multi-turn conversations, the same user message can mean different things depending on context. For example, "Tell me more" in a conversation about Valkey means something different from "Tell me more" in a conversation about Python.

## The challenge
<a name="semantic-caching-multi-turn-challenge"></a>

Single-prompt caching works well for stateless queries. In multi-turn conversations, you must cache the full conversation context, not just the last message:

```
# "Tell me more" means nothing without context
# Conversation A: "What is Valkey?" -> "Tell me more"  (about Valkey)
# Conversation B: "What is Python?" -> "Tell me more"  (about Python)
```

## Strategy: context-aware cache keys
<a name="semantic-caching-context-aware-keys"></a>

Instead of embedding only the last user message, embed a summary of the full conversation context. This way, similar follow-up questions in similar conversation flows can reuse cached answers.

```
def build_context_string(messages: list) -> str:
    """Build a cacheable context string from conversation messages."""
    # Use last 3 turns (6 messages: user + assistant pairs)
    recent = messages[-6:]
    parts = []
    for msg in recent:
        role = msg["role"]
        content = msg["content"][:200]  # Truncate long messages
        parts.append(f"{role}: {content}")
    return " | ".join(parts)
```

## Per-user cache isolation with TAG filters
<a name="semantic-caching-tag-filters"></a>

Use TAG fields to isolate cached conversations by user, session, or other dimensions. This prevents one user's cached conversations from being returned for another user:

```
# Create index with TAG field for per-user isolation
valkey_client.execute_command(
    "FT.CREATE", "conv_cache_idx",
    "SCHEMA",
    "context_summary", "TEXT",
    "response", "TEXT",
    "user_id", "TAG",
    "turn_count", "NUMERIC",
    "embedding", "VECTOR", "HNSW", "6",
    "TYPE", "FLOAT32",
    "DIM", "1024",
    "DISTANCE_METRIC", "COSINE",
)
```

Search with hybrid filtering (TAG \$1 KNN):

```
def lookup_conversation_cache(messages: list, user_id: str, threshold: float = 0.12):
    """Search cache for similar conversation contexts, scoped to a user.

    Note: FT.SEARCH with COSINE distance returns a distance score where
    0 = identical and 2 = opposite. A lower score means higher similarity.
    The threshold here is a maximum distance: only return results closer
    than this value.
    """
    context = build_context_string(messages)
    query_vec = get_embedding(context)

    # Hybrid search: filter by user_id TAG + KNN on context embedding
    results = valkey_client.execute_command(
        "FT.SEARCH", "conv_cache_idx",
        f"@user_id:{{{user_id}}}=>[KNN 1 @embedding $query_vec]",
        "PARAMS", "2", "query_vec", query_vec,
        "DIALECT", "2",
    )

    if results[0] > 0:
        fields = results[2]
        field_dict = {fields[j]: fields[j+1] for j in range(0, len(fields), 2)}
        distance = float(field_dict.get("__embedding_score", "999"))
        if distance < threshold:  # Lower distance = more similar
            return {"hit": True, "response": field_dict.get("response", ""), "distance": distance}

    return {"hit": False}
```

**Note**  
The `@user_id:{user_123}` TAG filter ensures that User A's cached conversations don't leak to User B. The hybrid query (TAG \$1 KNN) runs as a single atomic operation — pre-filtering by user, then finding the nearest conversation context.

## Cache isolation strategies
<a name="semantic-caching-isolation-strategies"></a>


| Strategy | TAG filter | Best for | 
| --- | --- | --- | 
| Per-user | @user\$1id:\$1user\$1123\$1 | Personalized assistants | 
| Per-session | @session\$1id:\$1sess\$1abc\$1 | Short-lived chats | 
| Global (shared) | No filter (\$1) | FAQ bots, common queries | 
| Per-model | @model:\$1gpt-4\$1 | Multi-model deployments | 
| Per-product | @product\$1id:\$1prod\$1456\$1 | E-commerce assistants | 

# Best practices
<a name="semantic-caching-best-practices"></a>

## Choosing data that can be cached
<a name="semantic-caching-bp-choosing-data"></a>

Semantic caching is well suited for repeated queries whose responses are relatively stable, whereas real-time or highly dynamic responses are often poor candidates for caching.

Use tag and numeric filters derived from existing application context (such as product ID, category, region, or user segment) to decide which queries and responses are eligible for caching and to improve the relevance of cache hits.

## Similarity threshold tuning
<a name="semantic-caching-bp-threshold"></a>

The similarity threshold controls the trade-off between cache hit rate and answer quality. Choose a threshold that balances cost savings with accuracy for your use case:


| Threshold | Hit rate | Quality risk | Best for | 
| --- | --- | --- | --- | 
| 0.95 (strict) | Low (\$125%) | Very low | Medical, legal, financial applications | 
| 0.90 (moderate) | Medium (\$155%) | Low | General chatbots | 
| 0.80 (balanced) | High (\$175%) | Low–Medium | FAQ bots, IT support | 
| 0.75 (relaxed) | Very high (\$190%) | Medium | High-volume repetitive queries | 

**Important**  
Start with a higher threshold (0.90–0.95) and gradually lower it while monitoring accuracy. Use A/B testing to find the optimal balance for your workload.

## Standalone queries versus conversations
<a name="semantic-caching-bp-standalone-vs-conversations"></a>
+ **For standalone queries** – Apply semantic caching directly on the user query text.
+ **For multi-turn conversations** – First use your conversation memory to retrieve the key facts and recent messages needed to answer the current turn. Then apply semantic caching to the combination of the current user message and the retrieved context, instead of embedding the entire raw dialogue.

## Setting cache invalidation periods
<a name="semantic-caching-bp-ttl"></a>

Use TTL to control how long cached responses are served before they are regenerated on a cache miss.


| Data type | Recommended TTL | Rationale | 
| --- | --- | --- | 
| Static facts (documentation, policies) | 24 hours | Facts change infrequently | 
| Product information | 12–24 hours | Updated daily in most catalogs | 
| General assistant responses | 1–4 hours | Balance freshness with hit rate | 
| Real-time data (prices, inventory) | 5–15 minutes | Data changes frequently | 
| Conversation context | 30 minutes | Session-scoped, short-lived | 

```
# Set TTL with random jitter to spread out cache invalidations
import random

base_ttl = 82800  # ~23 hours
jitter = random.randint(0, 3600)  # Up to 1 hour of jitter
valkey_client.expire(cache_key, base_ttl + jitter)
```

**Tip**  
Set TTLs that match your application use case and how often your data or model outputs change. Longer TTLs increase cache hit rates but raise the risk of outdated answers. Shorter TTLs keep responses fresher but lower cache hit rates and require more LLM inference.

## Monitoring and cost tracking
<a name="semantic-caching-bp-monitoring"></a>

Track cache performance metrics to optimize your semantic cache over time:

```
def record_cache_event(valkey_client, event_type: str):
    """Track cache hits and misses using atomic counters."""
    valkey_client.incr(f"cache:metrics:{event_type}")

    # Also track hourly for time-series analysis
    from datetime import datetime
    hour_key = datetime.now().strftime("%Y%m%d%H")
    counter_key = f"cache:metrics:{event_type}:{hour_key}"
    valkey_client.incr(counter_key)
    valkey_client.expire(counter_key, 86400 * 7)  # Keep 7 days

def get_cache_stats(valkey_client) -> dict:
    """Get current cache performance metrics."""
    hits = int(valkey_client.get("cache:metrics:hit") or 0)
    misses = int(valkey_client.get("cache:metrics:miss") or 0)
    total = hits + misses
    hit_rate = hits / total if total > 0 else 0

    avg_cost_per_call = 0.015  # Example: ~$0.015 per LLM call
    savings = hits * avg_cost_per_call

    return {
        "total_requests": total,
        "hits": hits,
        "misses": misses,
        "hit_rate": round(hit_rate, 3),
        "estimated_savings_usd": round(savings, 2),
    }
```

## Memory management
<a name="semantic-caching-bp-memory"></a>
+ **Set maxmemory policy** – Configure `maxmemory-policy allkeys-lru` on your ElastiCache cluster to automatically evict least-recently-used cache entries when the cluster reaches its memory limit.
+ **Plan for capacity** – Each cache entry typically requires approximately 4–6 KB (embedding dimensions × 4 bytes \$1 query text \$1 response text). A 1 GB ElastiCache instance can store approximately 170,000 cached entries.
+ **Use cache invalidation for stale data** – When underlying data changes, use text search to find and invalidate related cache entries:

  ```
  def invalidate_by_topic(valkey_client, topic_keyword: str):
      """Remove cached entries matching a topic after a data update."""
      results = valkey_client.execute_command(
          "FT.SEARCH", "semantic_cache",
          f"@query:{topic_keyword}",
          "NOCONTENT",  # Only return keys, not fields
      )
  
      if results[0] > 0:
          keys = results[1:]
          for key in keys:
              valkey_client.delete(key)
          print(f"Invalidated {len(keys)} cached entries for '{topic_keyword}'")
  ```

# Related resources
<a name="semantic-caching-related-resources"></a>
+ [Vector search for ElastiCache](https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/vector-search.html)
+ [Common ElastiCache use cases](https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/elasticache-use-cases.html)
+ [Lower cost and latency for AI using Amazon ElastiCache as a semantic cache with Amazon Bedrock](https://aws.amazon.com/blogs/database/lower-cost-and-latency-for-ai-using-amazon-elasticache-as-a-semantic-cache-with-amazon-bedrock/) (AWS Database Blog)
+ [Announcing vector search for Amazon ElastiCache](https://aws.amazon.com/blogs/database/announcing-vector-search-for-amazon-elasticache/) (AWS Database Blog)
+ [Amazon Bedrock AgentCore Runtime](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agents-tools-runtime.html)
+ [LangGraph checkpoint for Valkey](https://pypi.org/project/langgraph-checkpoint-aws/)
+ [Valkey client libraries](https://valkey.io/clients/)