SOC 2 was written before agents. The Trust Services Criteria still apply — but how you map them to an MCP-driven system changes a lot. Here is the practical control mapping auditors will accept in 2026, plus the evidence trail you need to make it painless.
What SOC 2 cares about, applied to MCP
SOC 2 Type II evaluates controls against five Trust Services Criteria: Security, Availability, Processing Integrity, Confidentiality, Privacy. For an MCP-heavy stack, the changes concentrate in Security and Processing Integrity.
This article assumes you are aiming for Type II and have an existing SOC 2 program. If you are also subject to GDPR, pair this with GDPR compliant AI agents.
The control mapping
CC6.1 — logical access
Auditors will ask: "Who can call which MCP tools, and how do you enforce it?"
Required:
- Per-call authorisation, not standing grants. See MCP access control lists.
- Workload identity for every MCP server (not shared tokens).
- Quarterly access reviews including agent service accounts.
Evidence: policy decision logs, identity issuance logs, access review attestations.
CC6.6 — identification of unauthorised access
For MCP: tool calls outside expected scope, denied authorisations, anomalous tool sequences.
Required:
- Real-time alerting on policy denies.
- Trace storage for at least the audit window (12 months for Type II).
- Documented response to detected anomalies.
Evidence: alerting configuration, sample incident tickets, audit trails.
CC6.7 — transmission and disposal
MCP uses JSON-RPC over stdio (local) or HTTP/SSE (remote). Both need encryption in transit when crossing trust boundaries.
Required:
- mTLS for any MCP server reachable over the network.
- Documented data-disposal policy for trace stores (with PII-aware redaction).
Evidence: TLS configuration, retention/disposal policy, sample disposal logs.
CC7.2 — system monitoring
The agent itself is part of the system. Monitoring the LLM is now in scope.
Required:
- Real-time metrics covering tool calls, error rates, latency. See real-time agent monitoring.
- Alerting wired to on-call.
- Documented runbooks per alert.
Evidence: monitoring dashboards, alert policies, runbook docs.
CC8.1 — change management
When does a model upgrade or a new prompt count as a "change"? Auditors want it to count.
Required:
- Pinned model versions in production (no auto-bumps).
- PR review for prompt changes, with the same rigour as code.
- Regression suite as part of the change pipeline (see continuous agent regression testing).
Evidence: deployment changelog, PR reviews, regression results per release.
PI1.4 — processing integrity
For MCP: was the right tool called with the right arguments, did the result correctly reach the user?
Required:
- Trace for every user-visible task, end to end.
- Evidence of validation (schema checks, refusal rates within bounds).
- Documented error-handling and retry semantics — see distributed agent failure recovery.
Evidence: trace samples, validation logs, error-classification reports.
Evidence collection — automate from day one
The single biggest difference between a painful audit and an easy one: evidence is automated, not screenshot-based.
Build these collectors as part of your platform, not at audit time:
- Authorisation log archive. Every allow/deny decision, signed and immutable, retained for 12 months.
- Trace archive. Sampled (or full, if storage allows) traces per task, retained for 12 months.
- Model + prompt version manifest. Every deployment writes the active model IDs and prompt hashes to a versioned manifest. Auditors love this.
- Access review automation. A nightly job that lists all human and service principals with their grants, dumps to S3 with a hash, and emails the security team for review.
For each, document:
- What system writes it.
- Where it is stored.
- Retention and disposal.
- Who can access it (and the access log for that).
Vendor and sub-processor management
Every external MCP server is a sub-processor in SOC 2 terms. Auditors will ask:
- Is there a list?
- Does it include data flows?
- Is each item under contract or covered by acceptable terms?
- How do you know they are still trustworthy?
Required:
- Sub-processor registry covering every MCP server (including OSS ones used in prod).
- DPA (or equivalent) with each commercial vendor.
- Periodic re-review tied to the registry trust criteria — see trusted MCP registry providers.
Common gaps auditors find
The first SOC 2 audit of an MCP-heavy stack typically surfaces:
- Shared service-account tokens. Replace with workload identity before the audit, not during.
- Auto-pulled
latesttags. Pin every dependency by digest. - Trace store with raw PII. Redact at write time; the trace store is in scope.
- No regression suite. "We test in prod" is not an answer.
- Undocumented break-glass paths. Document them, audit access, time-limit them.
Fixing each post-audit costs 5–10× more than building it in.
A 90-day prep plan
If you are 90 days from your first audit window:
| Days | Focus |
|---|---|
| 0–30 | Inventory every MCP server. Write sub-processor registry. Pin all dependencies. |
| 30–60 | Implement per-call authorisation + audit log. Stand up regression suite. |
| 60–75 | Wire monitoring + alerting + runbooks. |
| 75–90 | Dry-run with internal "auditor" reviewing the evidence trail. Fix gaps. |
If you slip on any of these, slip the audit window. A failed Type II audit is worse than a delayed one.
Where this is heading
AICPA is drafting AI-specific guidance that will likely formalise much of the above. EU equivalents (under the AI Act) are similar in shape — see EU AI Act MCP compliance. Build now to a strict interpretation; the formal guidance will codify what good teams already do.