Semantic memory remembers facts. Episodic memory remembers events. The agent that recalls "we decided to ship in Q2" from a chat three weeks ago feels vastly more useful than one that does not. Here is how to build episodic memory without the usual pitfalls.
Episodic vs. semantic, briefly
| Memory type | Stores | Retrieved by |
|---|---|---|
| Working | Current message array | Always loaded |
| Semantic | Facts, preferences | Always loaded |
| Episodic | Past events with timestamps | Topical or temporal query |
| Procedural | Skills, recipes | Tool definitions |
Episodic memory is the layer that lets an agent answer "what did we discuss about pricing last week?"
What to capture per episode
A useful episode record:
{
"id": "ep_2026-04-25_a83",
"user_id": "u:alex",
"session_id": "s:9f2c",
"started_at": "2026-04-25T09:00:00Z",
"ended_at": "2026-04-25T09:18:00Z",
"topic_tags": ["pricing", "enterprise tier"],
"summary": "Discussed enterprise pricing structure...",
"key_events": [
{ "type": "decision", "what": "Move to per-seat pricing for >50 seat plans" },
{ "type": "open_question", "what": "What is the floor for SOC2-required customers?" }
],
"entities": ["o:acme", "p:dale"],
"embedding": [...]
}
Five fields matter most: timestamps, topic tags, summary (200 tokens max), key events (typed), embedding for retrieval.
Extraction at session end
A summariser agent runs on the full transcript:
Extract from this conversation:
1. A 1-2 sentence summary.
2. Topic tags (max 5).
3. Key events: decisions, open questions, commitments, surprises.
4. Entities mentioned.
Return JSON.
Latency budget: 5–10 seconds end of session. Async-queue if your sessions end in spikes.
Retrieval at session start
Two queries run in parallel:
- Recent — last 5 episodes for this user, regardless of topic.
- Relevant — top 3 episodes by vector similarity to the current opening message.
Merge, dedupe, prepend as a <previous_episodes> block in the system prompt. Typical size: 500–1500 tokens.
Storage layout
A simple two-table layout:
CREATE TABLE episodes (
id text primary key,
user_id text not null,
started_at timestamptz not null,
ended_at timestamptz not null,
summary text not null,
topic_tags text[] not null,
embedding vector(1536) not null
);
CREATE INDEX ON episodes (user_id, started_at DESC);
CREATE INDEX ON episodes USING ivfflat (embedding vector_cosine_ops);
CREATE TABLE episode_events (
episode_id text references episodes(id),
type text not null,
what text not null,
ts timestamptz not null
);
pgvector handles the embedding side. The events table feeds typed queries ("what decisions has this user made about X?").
Decay and pruning
Episodes get less relevant over time. Three approaches:
- Hard prune — drop episodes older than N months unless tagged "important".
- Soft decay — keep the row but multiply retrieval scores by a decay factor.
- Compression — merge K old episodes into one super-summary.
We default to soft decay + annual compression. Hard prune only on user request (right-to-erasure).
Privacy and retrieval scoping
Episodic memory is the most PII-dense memory layer. Three guardrails:
- Tenant + user namespacing at every query.
- Purpose tags so support episodes do not leak into marketing prompts.
- User-visible "what do you remember about me?" page.
See GDPR-compliant agents for the legal framing.
Common mistakes
- Storing the full transcript as the episode — bloats the index, dilutes retrieval.
- No typed events — "summary text" makes querying for decisions hard.
- No decay — old preferences outvote new ones.
- Cross-user retrieval — the disaster scenario; defend in the schema, not the prompt.
Where this is heading
Two shifts: standardised episodic memory schemas in MCP, and per-user "memory dashboards" that ship as a default UI primitive. Build the schema now, swap implementations later.