Skip to main content
Comparison4 min read

Shared memory backends for AI agents: mem0 vs Zep vs Letta vs roll-your-own

Managed memory services are emerging as their own category. Here is what mem0, Zep, and Letta actually do — and when self-hosted Postgres still beats them.

A shared memory backend is no longer a research project — it is a category with three credible managed services and a fast-moving spec landscape. Here is how mem0, Zep, and Letta actually compare, and where self-hosted is still the right call.

What "shared memory backend" means

A managed service that stores and serves agent memory across sessions, users, and (sometimes) agents. Distinct from:

  • A vector database. Embeddings only; no semantic structure on top. See vector memory for AI agents.
  • A conversation store. Raw history, no extraction or retrieval.
  • A general database. Postgres can be a memory backend with effort; managed memory removes the effort.

The category exists because everyone reinvents the same memory layer (extract facts → embed → retrieve → summarise) badly. Managed services try to ship this once.

The three contenders

mem0

Opinionated, fact-extraction-first. You write conversation turns; mem0 extracts atomic facts and stores them. On retrieval, it returns relevant facts as a flat list.

from mem0 import MemoryClient
client = MemoryClient(api_key=...)

client.add(messages, user_id="alice")
context = client.search(query="shipping preferences", user_id="alice")

Strengths: Simple API, good fact extraction, free open-source tier. Weaknesses: No native graph traversal. Less control over retrieval ranking.

Zep

Temporal knowledge graph. Stores facts as edges in a graph keyed by time. Retrieval blends semantic search with graph traversal.

from zep_python.client import AsyncZep
zep = AsyncZep(api_key=...)

await zep.memory.add(session_id, messages=messages)
memory = await zep.memory.get(session_id)  # → graph context

Strengths: Temporal reasoning ("what changed between March and now"), strong for long-running sessions. Weaknesses: Heavier mental model. Per-session structure does not always fit cross-user memory.

Letta (formerly MemGPT)

Memory-first agent runtime. The "agent" and "memory" are the same product — memory is paged into the model's context as the runtime decides.

from letta_client import Letta
client = Letta(token=...)

agent = client.agents.create(memory_blocks=[{"label":"persona","value":"…"}])
client.agents.messages.create(agent_id=agent.id, messages=[…])

Strengths: Tight model + memory integration. Good if you want memory invisible to your code. Weaknesses: You buy the runtime, not just the memory. Harder to mix with other agent frameworks.

Side-by-side

Dimension mem0 Zep Letta Self-hosted Postgres
Storage model facts temporal graph paged context whatever you build
Retrieval semantic semantic + graph runtime-managed yours
Cross-session yes yes yes yes
Cross-user yes yes yes yes
Open source tier yes yes (community) yes (server) n/a
Runtime coupling none none tight none
Hosted SLA yes yes yes n/a
Vendor lock-in low medium high none

When self-hosted still wins

Three cases where rolling your own beats managed:

  1. Strict data residency. Healthcare, finance, EU-only deployments. Even managed-with-VPC adds compliance load. Postgres in your VPC is one less audit conversation.
  2. You already operate Postgres at scale. Adding pgvector and a small extraction pipeline is straightforward; adopting a new managed service is a procurement event.
  3. Memory shape is your moat. If your retrieval algorithm is competitive differentiation, do not outsource it.

A minimal self-hosted stack:

[Conversation turns]
       │
       ▼
[Extraction worker] ── LLM call → facts (subject, predicate, object)
       │
       ▼
[Postgres + pgvector]
   ├── facts table (text + embedding + subject_id)
   ├── relations table (subject → object, edge type)
   └── summaries table (rolling per-user)
       │
       ▼
[Retrieval API] — semantic + relational + temporal slice

Pair with persistent agent memory architecture for the architectural background.

When managed is the right call

  • Time-to-prototype matters. mem0 in an afternoon beats two weeks of pgvector schema work.
  • Memory is not your moat. You are building a vertical agent and want to focus on the domain.
  • You need temporal reasoning out of the box. Zep's graph beats hand-rolled temporal SQL by months of work.

Migration paths

The good news: the data shape is similar across all three managed services and Postgres. A migration script per pair exists or is straightforward to write. Lock-in fear is overblown — the harder lock-in is API surface coupling, not data format. Wrap the memory client behind your own interface from day one and the rest is mechanics.

interface MemoryStore {
  add(userId: string, turns: Message[]): Promise<void>;
  search(userId: string, query: string, k: number): Promise<Fact[]>;
  facts(userId: string, subject?: string): Promise<Fact[]>;
}

Three implementations, one interface, swap freely.

Pricing reality (April 2026)

Service Free tier Paid Cost driver
mem0 1k memories usage-based API calls
Zep community OSS per-seat or volume session count
Letta self-host free hosted varies agent count
Postgres n/a infra cost storage + reads

For most teams in the prototype-to-1k-users range, all three managed options are < $200/mo. Above that, model the costs against expected memory growth — the managed services scale super-linearly with memory volume.

Loadout

Build your AI agent loadout

Directory
Contact
© 2026 Loadout. Built on Angular 21 SSR.