Cross-session agent memory: making your agent remember without exploding the context

An agent that forgets you between sessions feels like talking to an amnesiac. An agent that remembers everything verbatim runs out of context in three turns. Cross-session memory is the architectural middle ground — and 2026 is the year it stops being optional.

Why memory is suddenly the differentiator

Until recently, "memory" in LLM apps meant "load the last N messages". That works for a single session. It collapses the moment a user expects continuity across days. Three forces converged in 2026:

Long-running agents (CS bots, research assistants, SRE copilots) became real products.
Users discovered they do not want to re-explain themselves every session.
Memory MCP servers (Mem0, Zep, the official memory MCP) made cross-session memory pluggable.

Three flavours of memory you actually need

Working memory

The current conversation, fits in the context window. No persistence, lives in the message array.

Episodic memory

"What we talked about last Tuesday." Stored as summarised events with timestamps. Retrieved by recency or topic match. Useful for "remind me what we decided about pricing".

Semantic memory

"What does the user prefer." Distilled facts: "user is a CTO at a B2B SaaS, prefers concise answers, hates emoji". Retrieved as a small bundle injected into every system prompt.

The shape of a working memory layer

A pragmatic stack looks like this:

System prompt
+ semantic facts (200 tokens)
-------------------------------
Recent messages (last 20 turns)
-------------------------------
Retrieved episodes (top 3 by relevance)

Total: ~3-5k tokens, regardless of how long the relationship has lasted.

The four hard problems

1. What to store

Storing every message bloats the index and dilutes retrieval. The fix: a memory writer agent that runs after each session and extracts only durable facts (preferences, decisions, constraints) and significant episodes (notable conversations).

2. When to forget

Memories should decay. A user address from 2024 may no longer be true. Patterns that work: timestamped facts with confidence scores, contradiction detection on write, periodic re-validation prompts.

3. How to retrieve without hallucination

Vector similarity gives plausibly-relevant results, not actually-relevant ones. Combine vector + keyword + metadata filters. Always show the model the source memory, not a paraphrase, so it can self-correct.

4. How to keep memory private

Cross-session memory IS persistent PII. GDPR/CCPA right-to-erasure means you need delete-by-user-id from day one. Encrypt at rest. Do not ship memories across tenant boundaries.

Build vs buy in 2026

Option	Best for	Trade-off
Memory MCP (official)	Single-user dev tools	Knowledge-graph model, manual schema
Mem0	Multi-user products	Hosted SaaS, vendor lock-in
Zep	Enterprise, self-host	More infra to run
Roll your own (Postgres + pgvector)	Full control	Six months of edge-case work

A minimum viable memory in 50 lines

// after each session
const summary = await llm.complete({
  prompt: 'Extract durable facts and notable events from this conversation as JSON.',
  messages: session.messages,
});
await db.insert('memories', {
  user_id, summary,
  embedding: await embed(summary),
  created_at: now(),
});

// before each session
const recent = await db.query(
  'SELECT summary FROM memories WHERE user_id=$1 ORDER BY created_at DESC LIMIT 5',
  [user_id],
);
const relevant = await db.vectorSearch('memories', userQuery, { top: 3 });
const memoryBundle = [...recent, ...relevant]
  .map(m => m.summary)
  .join('\n');

Where this is heading

Expect three shifts over the next year: standardised memory schemas in MCP, native memory primitives in the Claude Agent SDK, and per-user memory dashboards letting users see and edit what the agent "knows" about them. The last one is non-negotiable for consumer products.