Skip to main content
Comparison4 min read

Shared memory backends for AI agents: mem0 vs Zep vs Letta vs roll-your-own

Managed memory services are emerging as their own category. Here is what mem0, Zep, and Letta actually do — and when self-hosted Postgres still beats them.

A shared memory backend is no longer a research project — it is a category with three credible managed services and a fast-moving spec landscape. Here is how mem0, Zep, and Letta actually compare, and where self-hosted is still the right call.

What "shared memory backend" means

A managed service that stores and serves agent memory across sessions, users, and (sometimes) agents. Distinct from:

  • A vector database. Embeddings only; no semantic structure on top. See vector memory for AI agents.
  • A conversation store. Raw history, no extraction or retrieval.
  • A general database. Postgres can be a memory backend with effort; managed memory removes the effort.

The category exists because everyone reinvents the same memory layer (extract facts → embed → retrieve → summarise) badly. Managed services try to ship this once.

The three contenders

mem0

Opinionated, fact-extraction-first. You write conversation turns; mem0 extracts atomic facts and stores them. On retrieval, it returns relevant facts as a flat list.

from mem0 import MemoryClient
client = MemoryClient(api_key=...)

client.add(messages, user_id="alice")
context = client.search(query="shipping preferences", user_id="alice")

Strengths: Simple API, good fact extraction, free open-source tier. Weaknesses: No native graph traversal. Less control over retrieval ranking.

Zep

Temporal knowledge graph. Stores facts as edges in a graph keyed by time. Retrieval blends semantic search with graph traversal.

from zep_python.client import AsyncZep
zep = AsyncZep(api_key=...)

await zep.memory.add(session_id, messages=messages)
memory = await zep.memory.get(session_id)  # → graph context

Strengths: Temporal reasoning ("what changed between March and now"), strong for long-running sessions. Weaknesses: Heavier mental model. Per-session structure does not always fit cross-user memory.

Letta (formerly MemGPT)

Memory-first agent runtime. The "agent" and "memory" are the same product — memory is paged into the model's context as the runtime decides.

from letta_client import Letta
client = Letta(token=...)

agent = client.agents.create(memory_blocks=[{"label":"persona","value":"…"}])
client.agents.messages.create(agent_id=agent.id, messages=[…])

Strengths: Tight model + memory integration. Good if you want memory invisible to your code. Weaknesses: You buy the runtime, not just the memory. Harder to mix with other agent frameworks.

Side-by-side

Dimension mem0 Zep Letta Self-hosted Postgres
Storage model facts temporal graph paged context whatever you build
Retrieval semantic semantic + graph runtime-managed yours
Cross-session yes yes yes yes
Cross-user yes yes yes yes
Open source tier yes yes (community) yes (server) n/a
Runtime coupling none none tight none
Hosted SLA yes yes yes n/a
Vendor lock-in low medium high none

When self-hosted still wins

Three cases where rolling your own beats managed:

  1. Strict data residency. Healthcare, finance, EU-only deployments. Even managed-with-VPC adds compliance load. Postgres in your VPC is one less audit conversation.
  2. You already operate Postgres at scale. Adding pgvector and a small extraction pipeline is straightforward; adopting a new managed service is a procurement event.
  3. Memory shape is your moat. If your retrieval algorithm is competitive differentiation, do not outsource it.

A minimal self-hosted stack:

[Conversation turns]
       │
       ▼
[Extraction worker] ── LLM call → facts (subject, predicate, object)
       │
       ▼
[Postgres + pgvector]
   ├── facts table (text + embedding + subject_id)
   ├── relations table (subject → object, edge type)
   └── summaries table (rolling per-user)
       │
       ▼
[Retrieval API] — semantic + relational + temporal slice

Pair with persistent agent memory architecture for the architectural background.

When managed is the right call

  • Time-to-prototype matters. mem0 in an afternoon beats two weeks of pgvector schema work.
  • Memory is not your moat. You are building a vertical agent and want to focus on the domain.
  • You need temporal reasoning out of the box. Zep's graph beats hand-rolled temporal SQL by months of work.

Migration paths

The good news: the data shape is similar across all three managed services and Postgres. A migration script per pair exists or is straightforward to write. Lock-in fear is overblown — the harder lock-in is API surface coupling, not data format. Wrap the memory client behind your own interface from day one and the rest is mechanics.

interface MemoryStore {
  add(userId: string, turns: Message[]): Promise<void>;
  search(userId: string, query: string, k: number): Promise<Fact[]>;
  facts(userId: string, subject?: string): Promise<Fact[]>;
}

Three implementations, one interface, swap freely.

Pricing reality (April 2026)

Service Free tier Paid Cost driver
mem0 1k memories usage-based API calls
Zep community OSS per-seat or volume session count
Letta self-host free hosted varies agent count
Postgres n/a infra cost storage + reads

For most teams in the prototype-to-1k-users range, all three managed options are < $200/mo. Above that, model the costs against expected memory growth — the managed services scale super-linearly with memory volume.

Loadout

Build your AI agent loadout

The directory of MCP servers and AI agents that actually work. Pick the right loadout for Slack, Postgres, GitHub, Figma and 20+ integrations — with install commands ready to paste into Claude Desktop, Cursor or your own stack.

© 2026 Loadout. Built on Angular 21 SSR.