Agent episodic memory implementation: what to capture, what to drop, how to retrieve

Semantic memory remembers facts. Episodic memory remembers events. The agent that recalls "we decided to ship in Q2" from a chat three weeks ago feels vastly more useful than one that does not. Here is how to build episodic memory without the usual pitfalls.

Episodic vs. semantic, briefly

Memory type	Stores	Retrieved by
Working	Current message array	Always loaded
Semantic	Facts, preferences	Always loaded
Episodic	Past events with timestamps	Topical or temporal query
Procedural	Skills, recipes	Tool definitions

Episodic memory is the layer that lets an agent answer "what did we discuss about pricing last week?"

What to capture per episode

A useful episode record:

{
  "id": "ep_2026-04-25_a83",
  "user_id": "u:alex",
  "session_id": "s:9f2c",
  "started_at": "2026-04-25T09:00:00Z",
  "ended_at": "2026-04-25T09:18:00Z",
  "topic_tags": ["pricing", "enterprise tier"],
  "summary": "Discussed enterprise pricing structure...",
  "key_events": [
    { "type": "decision", "what": "Move to per-seat pricing for >50 seat plans" },
    { "type": "open_question", "what": "What is the floor for SOC2-required customers?" }
  ],
  "entities": ["o:acme", "p:dale"],
  "embedding": [...]
}

Five fields matter most: timestamps, topic tags, summary (200 tokens max), key events (typed), embedding for retrieval.

Extraction at session end

A summariser agent runs on the full transcript:

Extract from this conversation:
1. A 1-2 sentence summary.
2. Topic tags (max 5).
3. Key events: decisions, open questions, commitments, surprises.
4. Entities mentioned.

Return JSON.

Latency budget: 5–10 seconds end of session. Async-queue if your sessions end in spikes.

Retrieval at session start

Two queries run in parallel:

Recent — last 5 episodes for this user, regardless of topic.
Relevant — top 3 episodes by vector similarity to the current opening message.

Merge, dedupe, prepend as a <previous_episodes> block in the system prompt. Typical size: 500–1500 tokens.

Storage layout

A simple two-table layout:

CREATE TABLE episodes (
  id text primary key,
  user_id text not null,
  started_at timestamptz not null,
  ended_at timestamptz not null,
  summary text not null,
  topic_tags text[] not null,
  embedding vector(1536) not null
);
CREATE INDEX ON episodes (user_id, started_at DESC);
CREATE INDEX ON episodes USING ivfflat (embedding vector_cosine_ops);

CREATE TABLE episode_events (
  episode_id text references episodes(id),
  type text not null,
  what text not null,
  ts timestamptz not null
);

pgvector handles the embedding side. The events table feeds typed queries ("what decisions has this user made about X?").

Decay and pruning

Episodes get less relevant over time. Three approaches:

Hard prune — drop episodes older than N months unless tagged "important".
Soft decay — keep the row but multiply retrieval scores by a decay factor.
Compression — merge K old episodes into one super-summary.

We default to soft decay + annual compression. Hard prune only on user request (right-to-erasure).

Privacy and retrieval scoping

Episodic memory is the most PII-dense memory layer. Three guardrails:

Tenant + user namespacing at every query.
Purpose tags so support episodes do not leak into marketing prompts.
User-visible "what do you remember about me?" page.

See GDPR-compliant agents for the legal framing.

Common mistakes

Storing the full transcript as the episode — bloats the index, dilutes retrieval.
No typed events — "summary text" makes querying for decisions hard.
No decay — old preferences outvote new ones.
Cross-user retrieval — the disaster scenario; defend in the schema, not the prompt.

Where this is heading

Two shifts: standardised episodic memory schemas in MCP, and per-user "memory dashboards" that ship as a default UI primitive. Build the schema now, swap implementations later.