

# Using Amazon ElastiCache for Valkey for agentic memory
<a name="agentic-memory"></a>

Agentic AI applications use external tools, APIs, and multi-step reasoning to complete complex tasks. However, by default, agents don't retain memory between conversations, which limits their ability to provide personalized responses or maintain context across sessions. Amazon ElastiCache for Valkey provides the high-performance, low-latency infrastructure that agentic memory systems require to store, retrieve, and manage persistent memory for AI agents.

This topic explains how to use ElastiCache for Valkey as the storage layer for agentic memory, covering the concepts, architecture, implementation, and best practices for building memory-enabled AI agents.

**Topics**
+ [Overview of agentic memory](agentic-memory-overview.md)
+ [Types of agentic memory](agentic-memory-types.md)
+ [Why ElastiCache for Valkey for agentic memory](agentic-memory-why-elasticache.md)
+ [Solution architecture](agentic-memory-architecture.md)
+ [Prerequisites](agentic-memory-prerequisites.md)
+ [Setting up ElastiCache for Valkey as a vector store for agentic memory](agentic-memory-setup.md)
+ [Performance benefits](agentic-memory-performance.md)
+ [Best practices](agentic-memory-best-practices.md)
+ [Related resources](agentic-memory-related-resources.md)

# Overview of agentic memory
<a name="agentic-memory-overview"></a>

An agentic AI application is a system that takes actions and makes decisions based on input. These agents use external tools, APIs, and multi-step reasoning to complete complex tasks. Without persistent memory, agents forget everything between conversations, making it impossible to deliver personalized experiences or complete multi-step tasks effectively.

Agentic memory handles the persistence, encoding, storage, retrieval, and summarization of knowledge gained through user interactions. This memory system is a critical part of the context management component of an agentic AI application, enabling agents to learn from past conversations and apply that knowledge to future interactions.

Consider the following examples where agentic memory provides value:
+ **Customer support agents** – An agent remembers a customer's previous issues, preferences, and account details across support sessions, avoiding repetitive information gathering and delivering faster resolutions.
+ **Research agents** – An agent that researches GitHub repositories remembers previously discovered project metrics, avoiding redundant web searches and reducing token usage and response time.
+ **Personal assistant agents** – An agent retains a user's scheduling preferences, communication style, and recurring tasks to provide increasingly personalized assistance over time.

# Types of agentic memory
<a name="agentic-memory-types"></a>

## Short-term memory
<a name="agentic-memory-short-term"></a>

Short-term memory maintains context within a single session. It tracks the current conversation flow, recent interactions, and intermediate reasoning steps. Short-term memory is essential for multi-turn conversations where the agent needs to reference earlier parts of the dialogue.

ElastiCache for Valkey supports short-term memory through data structures such as lists (for ordered chat history), hashes (for session metadata), and strings (for tool result caching with TTL-based expiration).

## Long-term memory
<a name="agentic-memory-long-term"></a>

Long-term memory stores information across multiple sessions. This enables agents to remember user preferences, past decisions, and historical context for future conversations. Long-term memory requires a persistent, searchable store that supports semantic retrieval — finding relevant memories based on meaning rather than exact keyword matches.

ElastiCache for Valkey supports long-term memory through its vector similarity search capabilities (available in Valkey 8.2 and later). Vector search enables semantic memory retrieval, allowing agents to find relevant memories based on meaning by comparing vector embeddings of stored memories against new queries.

## Additional memory types
<a name="agentic-memory-additional-types"></a>


| Memory type | Description | ElastiCache support | 
| --- | --- | --- | 
| Episodic memory | Records of specific past interactions and events | Vector search over stored conversation embeddings | 
| Semantic memory | General knowledge and facts extracted from interactions | Vector similarity search with HNSW or FLAT indexes | 
| Procedural memory | Knowledge about how to perform tasks and use tools | Hash-based storage of tool configurations and workflows | 

# Why ElastiCache for Valkey for agentic memory
<a name="agentic-memory-why-elasticache"></a>

ElastiCache for Valkey provides several capabilities that make it well suited as the storage layer for agentic memory:
+ **Sub-millisecond latency** – ElastiCache for Valkey delivers microsecond-level latency for memory operations, making it suitable for real-time agent interactions where memory lookups must not add perceptible delay to the user experience.
+ **Vector similarity search** – Starting with Valkey version 8.2, ElastiCache supports vector similarity search through the valkey-search module. This enables semantic memory retrieval, where agents can find relevant memories based on meaning rather than exact keyword matches.
+ **Real-time index updates** – New memories become immediately searchable after being written. This is critical for agentic applications where the agent may need to recall information it stored moments ago within the same session.
+ **Built-in cache management** – Features such as TTL (time to live), eviction policies (`allkeys-lru`), and atomic operations help manage the memory lifecycle.
+ **Multiple data structures** – Valkey provides hashes, lists, strings, streams, JSON, and vectors — each optimized for different memory patterns. A single ElastiCache instance can support session state (hashes), conversation history (lists), tool result caching (strings with TTL), event logs (streams), and semantic memory (vectors).
+ **Scalability** – ElastiCache scales to handle millions of requests with consistent low latency, supporting applications with large numbers of concurrent users and agents.

# Solution architecture
<a name="agentic-memory-architecture"></a>

The following architecture implements persistent memory for agentic AI applications using ElastiCache for Valkey as the vector storage component.

**Key components:**
+ **Amazon Bedrock AgentCore Runtime** – Provides the hosting environment for deploying and running agents. It provides access to the LLM and embedding models required for the architecture.
+ **Agent framework (for example, Strands Agents)** – Manages LLM invocations, tool execution, and user conversations. Strands Agents supports multiple LLMs, including models from Amazon Bedrock, Anthropic, Google Gemini, and OpenAI.
+ **Mem0** – The memory orchestration layer that sits between AI agents and storage systems. Mem0 manages the memory lifecycle, from extracting information from agent interactions to storing and retrieving it.
+ **Amazon ElastiCache for Valkey** – The managed in-memory data store that serves as the vector storage component. ElastiCache uses Valkey's vector similarity search capabilities to store high-dimensional vector embeddings, enabling semantic memory retrieval.

# Prerequisites
<a name="agentic-memory-prerequisites"></a>

To implement agentic memory with ElastiCache for Valkey, you need:

1. An AWS account with access to Amazon Bedrock, including Amazon Bedrock AgentCore Runtime and embedding models.

1. An ElastiCache cluster running Valkey 8.2 or later. Valkey 8.2 includes support for vector similarity search. For instructions on creating a cluster, see [Creating a cluster for Valkey or Redis OSS](Clusters.Create.md).

1. An Amazon Elastic Compute Cloud instance or other compute resource within the same Amazon VPC as your ElastiCache cluster.

1. Python 3.11 or later with the following packages:

   ```
   pip install strands-agents strands-agents-tools strands-agents-builder
   pip install mem0ai "mem0ai[vector_stores]"
   ```

# Setting up ElastiCache for Valkey as a vector store for agentic memory
<a name="agentic-memory-setup"></a>

The following walkthrough shows how to build a memory-enabled AI agent using Mem0 with ElastiCache for Valkey as the vector store.

## Step 1: Create a basic agent without memory
<a name="agentic-memory-step1"></a>

First, install Strands Agents and create a basic agent:

```
pip install strands-agents strands-agents-tools strands-agents-builder
```

Initialize a basic agent with an HTTP tool for web browsing:

```
from strands import Agent
from strands.tools import http_request

# Initialize agent with access to the tool to browse the web
agent = Agent(tools=[http_request])

# Format messages as expected by Strands
formatted_messages = [
    {
        "role": "user",
        "content": [{"text": "What is the URL for the project mem0 and its most important metrics?"}]
    }
]

result = agent(formatted_messages)
```

Without memory, the agent performs the same research tasks repeatedly for each request. In testing, the agent makes three tool calls to answer the request, using approximately 70,000 tokens and taking over 9 seconds to complete.

## Step 2: Configure Mem0 with ElastiCache for Valkey
<a name="agentic-memory-step2"></a>

Install the Mem0 library with the Valkey vector store connector:

```
pip install mem0ai "mem0ai[vector_stores]"
```

Configure Valkey as the vector store. ElastiCache for Valkey supports vector search capabilities starting with version 8.2:

```
from mem0 import Memory

# Configure Mem0 with ElastiCache for Valkey
config = {
    "vector_store": {
        "provider": "valkey",
        "config": {
            "valkey_url": "your-elasticache-cluster.cache.amazonaws.com:6379",
            "index_name": "agent_memory",
            "embedding_model_dims": 1024,
            "index_type": "flat"
        }
    }
}

m = Memory.from_config(config)
```

Replace *your-elasticache-cluster.cache.amazonaws.com* with your ElastiCache cluster's endpoint. For instructions on finding your cluster endpoint, see [Accessing your ElastiCache cluster](accessing-elasticache.md).

## Step 3: Add memory tools to the agent
<a name="agentic-memory-step3"></a>

Create memory tools that the agent can use to store and retrieve information. The `@tool` decorator transforms regular Python functions into tools the agent can invoke:

```
from strands import Agent, tool
from strands.tools import http_request

@tool
def store_memory_tool(information: str, user_id: str = "user") -> str:
    """Store important information in long-term memory."""
    memory_message = [{"role": "user", "content": information}]

    # Create new memories using Mem0 and store them in Valkey
    m.add(memory_message, user_id=user_id)

    return f"Stored: {information}"

@tool
def search_memory_tool(query: str, user_id: str = "user") -> str:
    """Search stored memories for relevant information."""

    # Search memories using Mem0 stored in Valkey
    results = m.search(query, user_id=user_id)
    if results['results']:
        return "\n".join([r['memory'] for r in results['results']])
    return "No memories found"

# Initialize Strands agent with memory tools
agent = Agent(tools=[http_request, store_memory_tool, search_memory_tool])
```

## Step 4: Test the memory-enabled agent
<a name="agentic-memory-step4"></a>

With memory enabled, the agent stores information from its interactions and retrieves it in subsequent requests:

```
# First request - agent searches the web and stores results in memory
formatted_messages = [
    {
        "role": "user",
        "content": [{"text": "What is the URL for the project mem0 and its most important metrics?"}]
    }
]
result = agent(formatted_messages)

# Second request (same question) - agent retrieves from memory
result = agent(formatted_messages)
```

On the second request, the agent retrieves the information from memory instead of making web tool calls. In testing, this reduced token usage from approximately 70,000 to 6,300 (a 12x reduction) and improved response time from 9.25 seconds to 2 seconds (more than 3x faster).

## How it works under the hood
<a name="agentic-memory-valkey-commands"></a>

The following table shows the Valkey commands that Mem0 uses internally to implement agentic memory with ElastiCache. Mem0 abstracts these commands through its API — the exact schema and key naming may vary depending on the Mem0 version and configuration:


| Operation | Valkey command | Description | 
| --- | --- | --- | 
| Create vector index | FT.CREATE agent\$1memory SCHEMA embedding VECTOR HNSW 6 TYPE FLOAT32 DIM 1024 DISTANCE\$1METRIC COSINE | Creates a vector index for semantic memory search | 
| Store memory | HSET mem:\$1id\$1 memory "..." embedding [bytes] user\$1id "user\$1123" created\$1at "..." | Stores a memory with its vector embedding | 
| Search memories | FT.SEARCH agent\$1memory "\$1=>[KNN 5 @embedding \$1query\$1vec]" PARAMS 2 query\$1vec [bytes] DIALECT 2 | Finds the most semantically similar memories | 
| Set expiration | EXPIRE mem:\$1id\$1 86400 | Sets TTL for memory entries | 

# Performance benefits
<a name="agentic-memory-performance"></a>

The following table summarizes the performance improvements observed in testing with a memory-enabled agent versus a stateless agent:


| Metric | Without memory | With memory | Improvement | 
| --- | --- | --- | --- | 
| Tool calls per request | 3 | 0 (memory retrieval) | Eliminated redundant tool calls | 
| Token usage | \$170,000 | \$16,300 | 12x reduction | 
| Response time | 9.25 seconds | 2 seconds | 3x\$1 faster | 
| Memory lookup latency | N/A | Sub-millisecond | Valkey in-memory performance | 

# Best practices
<a name="agentic-memory-best-practices"></a>

## Memory lifecycle management
<a name="agentic-memory-bp-lifecycle"></a>
+ **Use TTL for short-term memory** – Set appropriate TTL values on memory entries to automatically expire transient information. For session context, use TTLs of 30 minutes to 24 hours. For long-term user preferences, use longer TTLs or persist indefinitely.
+ **Implement memory decay** – Mem0 provides built-in decay mechanisms that remove irrelevant information over time. Configure these to prevent memory bloat as the agent accumulates more interactions.
+ **Deduplicate memories** – Before storing a new memory, check if a similar memory already exists using vector similarity search. Update existing memories rather than creating duplicates.

## Vector index configuration
<a name="agentic-memory-bp-index"></a>
+ **Choose the right index type** – Use `FLAT` for smaller memory stores (under 100,000 entries) where exact search is feasible. Use `HNSW` for larger stores where approximate nearest neighbor search provides better performance at scale.
+ **Select appropriate dimensions** – Match the embedding dimensions to your model. Amazon Titan Text Embeddings V2 produces 1024-dimensional vectors. OpenAI's text-embedding-3-small produces 1536-dimensional vectors.
+ **Use COSINE distance metric** – For text embeddings from models like Amazon Titan and OpenAI, COSINE distance is typically the most appropriate metric for measuring semantic similarity.

## Multi-user isolation
<a name="agentic-memory-bp-isolation"></a>
+ **Scope memories by user ID** – Always include a `user_id` parameter when storing and searching memories to prevent information leaking between users.
+ **Use TAG filters for efficient isolation** – When querying the vector index, use TAG filters (for example, `@user_id:{user_123}`) to pre-filter results by user before performing KNN search. This runs as a single atomic operation, providing both isolation and performance.

  ```
  # Example: TAG-filtered vector search for user isolation
  results = client.execute_command(
      "FT.SEARCH", "agent_memory",
      f"@user_id:{{{user_id}}}=>[KNN 5 @embedding $query_vec]",
      "PARAMS", "2", "query_vec", query_vec,
      "DIALECT", "2",
  )
  ```

## Memory management at scale
<a name="agentic-memory-bp-scale"></a>
+ **Set maxmemory policy** – Configure `maxmemory-policy allkeys-lru` on your ElastiCache cluster to automatically evict least-recently-used memory entries when the cluster reaches its memory limit.
+ **Monitor memory usage** – Use Amazon CloudWatch metrics to track memory utilization, cache hit rates, and vector search latency. Set alarms for high memory usage to proactively manage capacity.
+ **Plan for capacity** – Each memory entry typically requires approximately 4–6 KB (embedding dimensions × 4 bytes \$1 metadata). A 1 GB ElastiCache instance can store approximately 170,000–250,000 memory entries depending on embedding size and metadata.

# Related resources
<a name="agentic-memory-related-resources"></a>
+ [Vector search for ElastiCache](https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/vector-search.html)
+ [Common ElastiCache use cases](https://docs.aws.amazon.com/AmazonElastiCache/latest/dg/elasticache-use-cases.html)
+ [Build persistent memory for agentic AI applications with Mem0 and Amazon ElastiCache for Valkey](https://aws.amazon.com/blogs/database/build-persistent-memory-for-agentic-ai-applications-with-mem0-open-source-amazon-elasticache-for-valkey-and-amazon-neptune-analytics/) (AWS Database Blog)
+ [Mem0 documentation — Valkey vector store](https://docs.mem0.ai/components/vectordbs/dbs/valkey)
+ [Strands Agents user guide](https://strandsagents.com/latest/documentation/docs/)
+ [Amazon Bedrock AgentCore Runtime](https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/agents-tools-runtime.html)
+ [Valkey vector search documentation](https://valkey.io/blog/introducing-valkey-search/)