Skip to main content
Explainer3 min read

Semantic memory for AI agents: knowledge graphs that survive contact with production

Semantic memory is the agent's long-term knowledge of facts and relationships. Pure vector stores miss the structure; knowledge graphs catch it. Here is how to build a graph-backed memory layer that holds up at scale.

Vector stores remember what looks similar. Semantic memory remembers what is true. For agents that reason over a user's life — preferences, relationships, decisions, deadlines — you need a knowledge graph alongside the embeddings. Here is how to build it without the usual graph-database bloat.

Why pure vector memory falls short

A vector store retrieves based on similarity. That is right when "what did we discuss about pricing" needs nearest-neighbour search over chat summaries. It is wrong when "what is the user's manager named" needs a precise edge in a typed graph. Three production failure modes:

  • Coreference drift — "Alex" the colleague and "Alex" the customer collapse into one vector cluster.
  • Temporal collapse — last year's address and this week's address have similar embeddings.
  • Inferential gaps — vectors cannot answer "who reports to whom?" without writing the chain into one summary.

Semantic memory solves these by storing typed entities, typed edges, and timestamps as first-class data.

The minimum graph schema

Three node types and three edge types cover 80% of agent use cases:

  • Person — id, name, aliases, attributes.
  • Org — id, name, type.
  • Topic — id, label, parent topic.

Edges:

  • MENTIONS — utterance → entity, weight, ts.
  • RELATES_TO — entity → entity, type (employs, married_to, lives_in), valid_from, valid_to.
  • PREFERS — Person → Topic, polarity, ts.

Stored in any graph DB (Neo4j, Memgraph) or in Postgres with two tables (nodes, edges). Simpler than it sounds — three tables solve most cases.

Writing into the graph

After each session, an extractor agent emits typed assertions, not free-form summary:

[
  { "type": "PERSON", "id": "p:alex.k", "name": "Alex Karpov", "aliases": ["Alex"] },
  { "type": "RELATES_TO", "from": "p:alex.k", "to": "o:acme", "rel": "employs", "valid_from": "2026-04" },
  { "type": "PREFERS", "from": "user", "to": "topic:concise-replies", "polarity": "+1" }
]

These merge with existing nodes by id; conflicts trigger a contradiction-resolution pass (see below).

Reading from the graph

At each turn, two queries run in parallel:

  1. Entity expansion — for every entity mentioned in the user prompt, fetch one-hop neighbours.
  2. Topical retrieval — for the topic of the prompt, fetch top-N preferences and policies.

Result is a small JSON bundle prepended to the system prompt as <memory>...</memory>. Typically 200–500 tokens.

The hybrid recipe: graph + vector

Pure graph misses fuzzy retrieval; pure vector misses structure. The working pattern:

  • Graph for facts and relationships (who, where, when, what kind).
  • Vector for utterances and summaries (what was said).
  • Edges in the graph point to vector chunks for source pinning.

Retrieval queries both: graph for typed answers, vector for narrative answers. Merge at the prompt builder.

Contradiction resolution

When new assertions disagree with old ones, decide by:

  1. Recency — newer wins by default for time-bounded facts (address, employer).
  2. Confidence — assertions extracted with high confidence beat low.
  3. Source authority — the user asserting beats the agent inferring.

Old assertions are not deleted; they get a valid_to timestamp. The graph remains an audit trail — useful for audit trails and GDPR access requests.

Cost model

For 100 sessions/month/user with summaries of 3k tokens:

Component Cost
Extractor agent (Haiku) ~$0.004
Graph storage (Postgres) < $0.001
Vector chunks for source ~$0.002
Read queries (cached) ~$0.001
Total ~$0.008 per user per month

Cheaper than vector-only because the structured queries are tiny.

Common mistakes

  • Free-text properties instead of typed edges — defeats the point.
  • No alias resolution — Alex, A. Karpov, and @alex.k stay separate.
  • No temporal validity — old facts outlive the world.
  • Storing everything — keep entities the agent will actually use; trim the rest.

Where this is heading

Two trends to watch in 2027: standardised semantic memory schemas in the MCP spec, and managed graph services that ship with Anthropic-style integration out of the box. Build the schema yourself now and you will swap implementations without rewriting the agent.

Loadout

Build your AI agent loadout

The directory of MCP servers and AI agents that actually work. Pick the right loadout for Slack, Postgres, GitHub, Figma and 20+ integrations — with install commands ready to paste into Claude Desktop, Cursor or your own stack.

© 2026 Loadout. Built on Angular 21 SSR.