Agents waste a surprising amount of work re-asking servers for things that haven't changed. The 2026-07-28 MCP spec fixes that at the protocol level with a new CacheableResult interface: list and read responses can now carry ttlMs and cacheScope (SEP-2549), modelled directly on HTTP Cache-Control. It's a small addition that quietly removes a lot of redundant traffic.
What the two fields mean
ttlMs is a freshness hint in milliseconds: it tells the client how long it may reuse this result before checking again. A server whose tool catalogue changes maybe once a day can advertise a ttlMs of an hour and let clients stop hammering tools/list on every single turn. cacheScope controls who's allowed to keep that copy — public means shared intermediaries (a gateway, a proxy, a fleet-wide cache) may store it, while private means only the end client may, because the response is specific to this user or token. If you've used Cache-Control: max-age and public/private on a web API, you already understand the model exactly.
Which responses can carry it
The fields attach to the results of tools/list, prompts/list, resources/list, resources/read and resources/templates/list — the discovery and read surface, not tool calls. That's deliberate: listing your tools or reading a static resource is highly cacheable, while a tools/call that books a flight or writes to a database is not. The split keeps the dangerous, side-effecting operations always-live while letting the cheap, repetitive metadata fetches go cold-cacheable.
Picking public vs private
The choice hinges on whether the response depends on who's asking. A generic tool catalogue that's identical for every user is public — cache it once at the gateway and serve the whole fleet from there. A resource list scoped to a particular account, or a read that reflects that user's permissions, must be private, or you risk leaking one tenant's view to another through a shared cache. When in doubt, default to private; it's the safe choice, and you can widen to public only for results you're certain are identical across every caller.
Why it matters more in a stateless world
This pairs with the rest of the 2026 stateless push. Once servers run behind plain load balancers with no sticky sessions, a shared public cache at the gateway becomes the natural place to absorb repeated tools/list calls — turning what was per-session chatter into a single cached response for everyone. That's a real latency and cost win at scale, and it composes with the new Mcp-Method routing headers and the stateless server migration. For client-side strategies that go beyond the spec hint, see MCP call caching strategies and call latency profiling.
Going further
CacheableResult is one of the quieter wins in the 2026-07-28 spec, but for high-traffic deployments it's among the most valuable. If you're tuning a server for cost, combine it with reducing agent API costs and read how it sits inside an enterprise MCP gateway. More in the developer-tools category.