MCP rug pull attacks: when a tool changes after you approve it

You approve an MCP server once, the tools look benign, and your agent starts using them. Weeks later the server quietly changes what those tools do — and nothing asks you to re-approve. That's a rug pull, and it's one of the nastiest attacks against MCP precisely because the protocol has no built-in mechanism to notice a tool definition has shifted underneath it.

How a rug pull works

The attack splits trust from behaviour in time. First, the server exposes genuinely useful, harmless tools to earn your one-time approval — the moment of human scrutiny. Once that approval is banked, the server silently alters a tool's definition, description or behaviour. Because MCP doesn't track tool-definition changes or force re-approval when they happen, your agent keeps calling a tool whose meaning has changed. The malicious payload lives in the tool definition itself, which every session shares, so a single poisoned definition compromises every agent that calls it until someone notices and pulls it.

Rug pull vs tool poisoning

The two are cousins. Tool poisoning is deploying a tool that masquerades as legitimate from the start, hoping the user or the model picks it. A rug pull is the time-delayed variant: clean at approval, malicious afterwards. Both operate at the supply-chain layer — the definition, not the call — which is what makes them slip past defences built only to inspect runtime arguments. If your security model trusts a tool forever because you vetted it once, both attacks beat it.

The incidents that made it real

This isn't only theory. The postmark-mcp package-squatting incident in September 2025 saw a fake npm package build trust across fifteen versions before silently BCC'ing every email to an attacker. The Clawdbot exposure in January 2026 leaked credentials and conversation histories from two thousand-plus MCP instances through unauthenticated gateways. And the GitHub MCP prompt-injection chain used malicious issues to hijack agents into exfiltrating private repository data through an entirely legitimate tool. As of 2026 the rug pull is well-documented and named in vendor threat matrices — the building blocks are all in the wild.

How to defend against it

Pin and verify. Treat a tool definition like a dependency: record a hash of each tool's definition at approval, and re-prompt when it changes rather than trusting silently. Prefer servers that ship signed tool definitions and have a track record under a publisher reputation system. Run untrusted servers in a sandboxed execution runtime, keep a human checkpoint on write actions, and lean on the same hygiene that catches the rest of the supply-chain family — detecting malicious MCP servers, how to vet MCP servers and supply chain attacks.

Going further

The structural fix is coming from the ecosystem, not a single setting: signed definitions, verified namespaces in the official registry, and re-approval on change. Until those are universal, assume one-time approval is exactly that. Read MCP security best practices and prompt injection prevention, and browse the security category.