A shared memory backend is no longer a research project — it is a category with three credible managed services and a fast-moving spec landscape. Here is how mem0, Zep, and Letta actually compare, and where self-hosted is still the right call.
What "shared memory backend" means
A managed service that stores and serves agent memory across sessions, users, and (sometimes) agents. Distinct from:
- A vector database. Embeddings only; no semantic structure on top. See vector memory for AI agents.
- A conversation store. Raw history, no extraction or retrieval.
- A general database. Postgres can be a memory backend with effort; managed memory removes the effort.
The category exists because everyone reinvents the same memory layer (extract facts → embed → retrieve → summarise) badly. Managed services try to ship this once.
The three contenders
mem0
Opinionated, fact-extraction-first. You write conversation turns; mem0 extracts atomic facts and stores them. On retrieval, it returns relevant facts as a flat list.
from mem0 import MemoryClient
client = MemoryClient(api_key=...)
client.add(messages, user_id="alice")
context = client.search(query="shipping preferences", user_id="alice")
Strengths: Simple API, good fact extraction, free open-source tier. Weaknesses: No native graph traversal. Less control over retrieval ranking.
Zep
Temporal knowledge graph. Stores facts as edges in a graph keyed by time. Retrieval blends semantic search with graph traversal.
from zep_python.client import AsyncZep
zep = AsyncZep(api_key=...)
await zep.memory.add(session_id, messages=messages)
memory = await zep.memory.get(session_id) # → graph context
Strengths: Temporal reasoning ("what changed between March and now"), strong for long-running sessions. Weaknesses: Heavier mental model. Per-session structure does not always fit cross-user memory.
Letta (formerly MemGPT)
Memory-first agent runtime. The "agent" and "memory" are the same product — memory is paged into the model's context as the runtime decides.
from letta_client import Letta
client = Letta(token=...)
agent = client.agents.create(memory_blocks=[{"label":"persona","value":"…"}])
client.agents.messages.create(agent_id=agent.id, messages=[…])
Strengths: Tight model + memory integration. Good if you want memory invisible to your code. Weaknesses: You buy the runtime, not just the memory. Harder to mix with other agent frameworks.
Side-by-side
| Dimension | mem0 | Zep | Letta | Self-hosted Postgres |
|---|---|---|---|---|
| Storage model | facts | temporal graph | paged context | whatever you build |
| Retrieval | semantic | semantic + graph | runtime-managed | yours |
| Cross-session | yes | yes | yes | yes |
| Cross-user | yes | yes | yes | yes |
| Open source tier | yes | yes (community) | yes (server) | n/a |
| Runtime coupling | none | none | tight | none |
| Hosted SLA | yes | yes | yes | n/a |
| Vendor lock-in | low | medium | high | none |
When self-hosted still wins
Three cases where rolling your own beats managed:
- Strict data residency. Healthcare, finance, EU-only deployments. Even managed-with-VPC adds compliance load. Postgres in your VPC is one less audit conversation.
- You already operate Postgres at scale. Adding
pgvectorand a small extraction pipeline is straightforward; adopting a new managed service is a procurement event. - Memory shape is your moat. If your retrieval algorithm is competitive differentiation, do not outsource it.
A minimal self-hosted stack:
[Conversation turns]
│
▼
[Extraction worker] ── LLM call → facts (subject, predicate, object)
│
▼
[Postgres + pgvector]
├── facts table (text + embedding + subject_id)
├── relations table (subject → object, edge type)
└── summaries table (rolling per-user)
│
▼
[Retrieval API] — semantic + relational + temporal slice
Pair with persistent agent memory architecture for the architectural background.
When managed is the right call
- Time-to-prototype matters. mem0 in an afternoon beats two weeks of pgvector schema work.
- Memory is not your moat. You are building a vertical agent and want to focus on the domain.
- You need temporal reasoning out of the box. Zep's graph beats hand-rolled temporal SQL by months of work.
Migration paths
The good news: the data shape is similar across all three managed services and Postgres. A migration script per pair exists or is straightforward to write. Lock-in fear is overblown — the harder lock-in is API surface coupling, not data format. Wrap the memory client behind your own interface from day one and the rest is mechanics.
interface MemoryStore {
add(userId: string, turns: Message[]): Promise<void>;
search(userId: string, query: string, k: number): Promise<Fact[]>;
facts(userId: string, subject?: string): Promise<Fact[]>;
}
Three implementations, one interface, swap freely.
Pricing reality (April 2026)
| Service | Free tier | Paid | Cost driver |
|---|---|---|---|
| mem0 | 1k memories | usage-based | API calls |
| Zep | community OSS | per-seat or volume | session count |
| Letta | self-host free | hosted varies | agent count |
| Postgres | n/a | infra cost | storage + reads |
For most teams in the prototype-to-1k-users range, all three managed options are < $200/mo. Above that, model the costs against expected memory growth — the managed services scale super-linearly with memory volume.