Vector stores remember what looks similar. Semantic memory remembers what is true. For agents that reason over a user's life — preferences, relationships, decisions, deadlines — you need a knowledge graph alongside the embeddings. Here is how to build it without the usual graph-database bloat.
Why pure vector memory falls short
A vector store retrieves based on similarity. That is right when "what did we discuss about pricing" needs nearest-neighbour search over chat summaries. It is wrong when "what is the user's manager named" needs a precise edge in a typed graph. Three production failure modes:
- Coreference drift — "Alex" the colleague and "Alex" the customer collapse into one vector cluster.
- Temporal collapse — last year's address and this week's address have similar embeddings.
- Inferential gaps — vectors cannot answer "who reports to whom?" without writing the chain into one summary.
Semantic memory solves these by storing typed entities, typed edges, and timestamps as first-class data.
The minimum graph schema
Three node types and three edge types cover 80% of agent use cases:
- Person — id, name, aliases, attributes.
- Org — id, name, type.
- Topic — id, label, parent topic.
Edges:
- MENTIONS — utterance → entity, weight, ts.
- RELATES_TO — entity → entity, type (
employs,married_to,lives_in), valid_from, valid_to. - PREFERS — Person → Topic, polarity, ts.
Stored in any graph DB (Neo4j, Memgraph) or in Postgres with two tables (nodes, edges). Simpler than it sounds — three tables solve most cases.
Writing into the graph
After each session, an extractor agent emits typed assertions, not free-form summary:
[
{ "type": "PERSON", "id": "p:alex.k", "name": "Alex Karpov", "aliases": ["Alex"] },
{ "type": "RELATES_TO", "from": "p:alex.k", "to": "o:acme", "rel": "employs", "valid_from": "2026-04" },
{ "type": "PREFERS", "from": "user", "to": "topic:concise-replies", "polarity": "+1" }
]
These merge with existing nodes by id; conflicts trigger a contradiction-resolution pass (see below).
Reading from the graph
At each turn, two queries run in parallel:
- Entity expansion — for every entity mentioned in the user prompt, fetch one-hop neighbours.
- Topical retrieval — for the topic of the prompt, fetch top-N preferences and policies.
Result is a small JSON bundle prepended to the system prompt as <memory>...</memory>. Typically 200–500 tokens.
The hybrid recipe: graph + vector
Pure graph misses fuzzy retrieval; pure vector misses structure. The working pattern:
- Graph for facts and relationships (who, where, when, what kind).
- Vector for utterances and summaries (what was said).
- Edges in the graph point to vector chunks for source pinning.
Retrieval queries both: graph for typed answers, vector for narrative answers. Merge at the prompt builder.
Contradiction resolution
When new assertions disagree with old ones, decide by:
- Recency — newer wins by default for time-bounded facts (address, employer).
- Confidence — assertions extracted with high confidence beat low.
- Source authority — the user asserting beats the agent inferring.
Old assertions are not deleted; they get a valid_to timestamp. The graph remains an audit trail — useful for audit trails and GDPR access requests.
Cost model
For 100 sessions/month/user with summaries of 3k tokens:
| Component | Cost |
|---|---|
| Extractor agent (Haiku) | ~$0.004 |
| Graph storage (Postgres) | < $0.001 |
| Vector chunks for source | ~$0.002 |
| Read queries (cached) | ~$0.001 |
| Total | ~$0.008 per user per month |
Cheaper than vector-only because the structured queries are tiny.
Common mistakes
- Free-text properties instead of typed edges — defeats the point.
- No alias resolution — Alex, A. Karpov, and @alex.k stay separate.
- No temporal validity — old facts outlive the world.
- Storing everything — keep entities the agent will actually use; trim the rest.
Where this is heading
Two trends to watch in 2027: standardised semantic memory schemas in the MCP spec, and managed graph services that ship with Anthropic-style integration out of the box. Build the schema yourself now and you will swap implementations without rewriting the agent.