An agent that forgets you between sessions feels like talking to an amnesiac. An agent that remembers everything verbatim runs out of context in three turns. Cross-session memory is the architectural middle ground — and 2026 is the year it stops being optional.
Why memory is suddenly the differentiator
Until recently, "memory" in LLM apps meant "load the last N messages". That works for a single session. It collapses the moment a user expects continuity across days. Three forces converged in 2026:
- Long-running agents (CS bots, research assistants, SRE copilots) became real products.
- Users discovered they do not want to re-explain themselves every session.
- Memory MCP servers (Mem0, Zep, the official memory MCP) made cross-session memory pluggable.
Three flavours of memory you actually need
Working memory
The current conversation, fits in the context window. No persistence, lives in the message array.
Episodic memory
"What we talked about last Tuesday." Stored as summarised events with timestamps. Retrieved by recency or topic match. Useful for "remind me what we decided about pricing".
Semantic memory
"What does the user prefer." Distilled facts: "user is a CTO at a B2B SaaS, prefers concise answers, hates emoji". Retrieved as a small bundle injected into every system prompt.
The shape of a working memory layer
A pragmatic stack looks like this:
System prompt
+ semantic facts (200 tokens)
-------------------------------
Recent messages (last 20 turns)
-------------------------------
Retrieved episodes (top 3 by relevance)
Total: ~3-5k tokens, regardless of how long the relationship has lasted.
The four hard problems
1. What to store
Storing every message bloats the index and dilutes retrieval. The fix: a memory writer agent that runs after each session and extracts only durable facts (preferences, decisions, constraints) and significant episodes (notable conversations).
2. When to forget
Memories should decay. A user address from 2024 may no longer be true. Patterns that work: timestamped facts with confidence scores, contradiction detection on write, periodic re-validation prompts.
3. How to retrieve without hallucination
Vector similarity gives plausibly-relevant results, not actually-relevant ones. Combine vector + keyword + metadata filters. Always show the model the source memory, not a paraphrase, so it can self-correct.
4. How to keep memory private
Cross-session memory IS persistent PII. GDPR/CCPA right-to-erasure means you need delete-by-user-id from day one. Encrypt at rest. Do not ship memories across tenant boundaries.
Build vs buy in 2026
| Option | Best for | Trade-off |
|---|---|---|
| Memory MCP (official) | Single-user dev tools | Knowledge-graph model, manual schema |
| Mem0 | Multi-user products | Hosted SaaS, vendor lock-in |
| Zep | Enterprise, self-host | More infra to run |
| Roll your own (Postgres + pgvector) | Full control | Six months of edge-case work |
A minimum viable memory in 50 lines
// after each session
const summary = await llm.complete({
prompt: 'Extract durable facts and notable events from this conversation as JSON.',
messages: session.messages,
});
await db.insert('memories', {
user_id, summary,
embedding: await embed(summary),
created_at: now(),
});
// before each session
const recent = await db.query(
'SELECT summary FROM memories WHERE user_id=$1 ORDER BY created_at DESC LIMIT 5',
[user_id],
);
const relevant = await db.vectorSearch('memories', userQuery, { top: 3 });
const memoryBundle = [...recent, ...relevant]
.map(m => m.summary)
.join('\n');
Where this is heading
Expect three shifts over the next year: standardised memory schemas in MCP, native memory primitives in the Claude Agent SDK, and per-user memory dashboards letting users see and edit what the agent "knows" about them. The last one is non-negotiable for consumer products.