Memory for AI Agents: Short-Term, Long-Term, and Episodic

Agents without memory are amnesiac — every conversation starts from zero. This post explains the three memory tiers (short-term, long-term, episodic), when to use each, and includes a working key-value memory store you can wire into any Claude or OpenAI agent today.

5 min read

If you've played with the agents from the last few days you've probably noticed something frustrating: every time you start a new conversation, the agent has no idea who you are or what you discussed before. That's not a bug — it's the default behaviour. LLMs are stateless; they only know what's in the current context window.

Memory is how we fix that. There are three distinct tiers, each with different trade-offs, and picking the wrong one is a common source of over-engineering (or under-engineering). Let's break them down.

Tier 1: In-Context Memory (Short-Term)

The simplest form of memory is the conversation history you pass in the messages array every turn. The model "remembers" everything in that list.

messages = [
    {"role": "user", "content": "My name is Hitesh."},
    {"role": "assistant", "content": "Hi Hitesh! How can I help?"},
    {"role": "user", "content": "What's my name?"},  # model will say "Hitesh"
]

Strengths: Zero setup. Works out of the box with every SDK.

Limitations: Bounded by the model's context window. claude-3-7-sonnet supports up to 200k tokens; gpt-4.1 up to 1M — but token cost scales linearly with history length, and very long conversations slow down inference. In-context memory also vanishes when the process restarts.

Use it when: you need memory only within a single session and the conversation will stay under a few hundred turns.

Tier 2: External Memory (Long-Term)

For memory that persists across sessions, you need to write it somewhere outside the process. The two common shapes are:

Key-value store — good for structured facts: user preferences, entity properties, configuration. Fast reads, no semantic search.

Vector store — good for unstructured text you want to query by meaning: past conversations, documents, notes. Slightly more setup, but lets you ask "what have we talked about that's relevant to X?" rather than "give me the record for key Y".

A Working Key-Value Memory Store

Here's a minimal persistent memory store backed by a JSON file. It's production-ready for low-volume use and easy to swap for Redis or DynamoDB later.

import json
import os
from pathlib import Path
from datetime import datetime
 
MEMORY_PATH = Path("agent_memory.json")
 
def load_memory() -> dict:
    if MEMORY_PATH.exists():
        return json.loads(MEMORY_PATH.read_text())
    return {}
 
def save_memory(memory: dict) -> None:
    MEMORY_PATH.write_text(json.dumps(memory, indent=2))
 
def remember(key: str, value: str) -> None:
    """Store a fact in long-term memory."""
    memory = load_memory()
    memory[key] = {"value": value, "updated_at": datetime.utcnow().isoformat()}
    save_memory(memory)
 
def recall(key: str) -> str | None:
    """Retrieve a fact from long-term memory."""
    memory = load_memory()
    entry = memory.get(key)
    return entry["value"] if entry else None

Wiring It Into a Claude Agent

Expose remember and recall as tools so the agent can decide what to store and when to look things up:

import anthropic
 
client = anthropic.Anthropic()
 
tools = [
    {
        "name": "remember",
        "description": "Store an important fact about the user or the current project for future sessions.",
        "input_schema": {
            "type": "object",
            "properties": {
                "key": {"type": "string", "description": "A short identifier, e.g. 'user_name' or 'preferred_language'"},
                "value": {"type": "string", "description": "The fact to store"},
            },
            "required": ["key", "value"],
        },
    },
    {
        "name": "recall",
        "description": "Look up a previously stored fact by key.",
        "input_schema": {
            "type": "object",
            "properties": {
                "key": {"type": "string"},
            },
            "required": ["key"],
        },
    },
]
 
def run_tool(name: str, inputs: dict) -> str:
    if name == "remember":
        remember(inputs["key"], inputs["value"])
        return f"Stored: {inputs['key']} = {inputs['value']}"
    if name == "recall":
        value = recall(inputs["key"])
        return value if value else f"No memory found for key '{inputs['key']}'"

Now the agent can say "I'll remember your preferred language is TypeScript" and it will persist across restarts.

Tier 3: Episodic Memory

Episodic memory is for retaining the gist of past conversations — not every token, but a compressed summary that can be injected into the context of a new session.

The pattern:

At the end of each session, ask the model to summarise the conversation into a brief "episode".
Store that summary (in your key-value store or a database).
At the start of the next session, retrieve recent episodes and prepend them to the system prompt.

def summarise_session(messages: list[dict]) -> str:
    """Ask Claude to produce a one-paragraph episode summary."""
    conversation_text = "\n".join(
        f"{m['role'].upper()}: {m['content']}" for m in messages
    )
    response = client.messages.create(
        model="claude-3-7-sonnet-20250219",
        max_tokens=256,
        messages=[
            {
                "role": "user",
                "content": f"Summarise this conversation in 2–3 sentences, capturing key decisions and context:\n\n{conversation_text}",
            }
        ],
    )
    return response.content[0].text
 
def build_system_prompt(user_id: str) -> str:
    episodes = recall(f"episodes_{user_id}") or "No previous sessions."
    return f"""You are a helpful assistant.
 
Previous session context:
{episodes}
 
Use this context when relevant, but don't mention it unless asked."""

This keeps your context window lean while preserving continuity across sessions.

Which Tier to Use When

Scenario	Recommended tier
Multi-turn chat within one session	In-context (messages array)
User preferences / profile facts	Long-term key-value
"What did we discuss yesterday?"	Episodic summaries
"Find docs relevant to my question"	Vector store (see Day 9)
All of the above in production	All three, layered

Upgrade Path

The JSON file store above is a drop-in for low traffic. When you're ready to scale:

Redis — swap json.loads(MEMORY_PATH.read_text()) for redis.get(key). No other changes.
DynamoDB / Firestore — same interface, cloud-backed, survives instance restarts.
Vector store (ChromaDB / Pinecone) — replace the key-value store for any memory that benefits from semantic search. We'll cover this in Day 9.

What's Next

Tomorrow we leave hosted models and go local: NVIDIA Nemotron 3 running on your own machine via Ollama, wired into the same agent loop with an OpenAI-compatible endpoint.