Shared memory backends for AI agents: mem0 vs Zep vs Letta vs roll-your-own

A shared memory backend is no longer a research project — it is a category with three credible managed services and a fast-moving spec landscape. Here is how mem0, Zep, and Letta actually compare, and where self-hosted is still the right call.

What "shared memory backend" means

A managed service that stores and serves agent memory across sessions, users, and (sometimes) agents. Distinct from:

A vector database. Embeddings only; no semantic structure on top. See vector memory for AI agents.
A conversation store. Raw history, no extraction or retrieval.
A general database. Postgres can be a memory backend with effort; managed memory removes the effort.

The category exists because everyone reinvents the same memory layer (extract facts → embed → retrieve → summarise) badly. Managed services try to ship this once.

The three contenders

mem0

Opinionated, fact-extraction-first. You write conversation turns; mem0 extracts atomic facts and stores them. On retrieval, it returns relevant facts as a flat list.

from mem0 import MemoryClient
client = MemoryClient(api_key=...)

client.add(messages, user_id="alice")
context = client.search(query="shipping preferences", user_id="alice")

Strengths: Simple API, good fact extraction, free open-source tier. Weaknesses: No native graph traversal. Less control over retrieval ranking.

Zep

Temporal knowledge graph. Stores facts as edges in a graph keyed by time. Retrieval blends semantic search with graph traversal.

from zep_python.client import AsyncZep
zep = AsyncZep(api_key=...)

await zep.memory.add(session_id, messages=messages)
memory = await zep.memory.get(session_id)  # → graph context

Strengths: Temporal reasoning ("what changed between March and now"), strong for long-running sessions. Weaknesses: Heavier mental model. Per-session structure does not always fit cross-user memory.

Letta (formerly MemGPT)

Memory-first agent runtime. The "agent" and "memory" are the same product — memory is paged into the model's context as the runtime decides.

from letta_client import Letta
client = Letta(token=...)

agent = client.agents.create(memory_blocks=[{"label":"persona","value":"…"}])
client.agents.messages.create(agent_id=agent.id, messages=[…])

Strengths: Tight model + memory integration. Good if you want memory invisible to your code. Weaknesses: You buy the runtime, not just the memory. Harder to mix with other agent frameworks.

Side-by-side

Dimension	mem0	Zep	Letta	Self-hosted Postgres
Storage model	facts	temporal graph	paged context	whatever you build
Retrieval	semantic	semantic + graph	runtime-managed	yours
Cross-session	yes	yes	yes	yes
Cross-user	yes	yes	yes	yes
Open source tier	yes	yes (community)	yes (server)	n/a
Runtime coupling	none	none	tight	none
Hosted SLA	yes	yes	yes	n/a
Vendor lock-in	low	medium	high	none

When self-hosted still wins

Three cases where rolling your own beats managed:

Strict data residency. Healthcare, finance, EU-only deployments. Even managed-with-VPC adds compliance load. Postgres in your VPC is one less audit conversation.
You already operate Postgres at scale. Adding pgvector and a small extraction pipeline is straightforward; adopting a new managed service is a procurement event.
Memory shape is your moat. If your retrieval algorithm is competitive differentiation, do not outsource it.

A minimal self-hosted stack:

[Conversation turns]
       │
       ▼
[Extraction worker] ── LLM call → facts (subject, predicate, object)
       │
       ▼
[Postgres + pgvector]
   ├── facts table (text + embedding + subject_id)
   ├── relations table (subject → object, edge type)
   └── summaries table (rolling per-user)
       │
       ▼
[Retrieval API] — semantic + relational + temporal slice

Pair with persistent agent memory architecture for the architectural background.

When managed is the right call

Time-to-prototype matters. mem0 in an afternoon beats two weeks of pgvector schema work.
Memory is not your moat. You are building a vertical agent and want to focus on the domain.
You need temporal reasoning out of the box. Zep's graph beats hand-rolled temporal SQL by months of work.

Migration paths

The good news: the data shape is similar across all three managed services and Postgres. A migration script per pair exists or is straightforward to write. Lock-in fear is overblown — the harder lock-in is API surface coupling, not data format. Wrap the memory client behind your own interface from day one and the rest is mechanics.

interface MemoryStore {
  add(userId: string, turns: Message[]): Promise<void>;
  search(userId: string, query: string, k: number): Promise<Fact[]>;
  facts(userId: string, subject?: string): Promise<Fact[]>;
}

Three implementations, one interface, swap freely.

Pricing reality (April 2026)

Service	Free tier	Paid	Cost driver
mem0	1k memories	usage-based	API calls
Zep	community OSS	per-seat or volume	session count
Letta	self-host free	hosted varies	agent count
Postgres	n/a	infra cost	storage + reads

For most teams in the prototype-to-1k-users range, all three managed options are < $200/mo. Above that, model the costs against expected memory growth — the managed services scale super-linearly with memory volume.