SOC 2 compliant MCP deployment: the controls auditors

SOC 2 was written before agents. The Trust Services Criteria still apply — but how you map them to an MCP-driven system changes a lot. Here is the practical control mapping auditors will accept in 2026, plus the evidence trail you need to make it painless.

What SOC 2 cares about, applied to MCP

SOC 2 Type II evaluates controls against five Trust Services Criteria: Security, Availability, Processing Integrity, Confidentiality, Privacy. For an MCP-heavy stack, the changes concentrate in Security and Processing Integrity.

This article assumes you are aiming for Type II and have an existing SOC 2 program. If you are also subject to GDPR, pair this with GDPR compliant AI agents.

The control mapping

CC6.1 — logical access

Auditors will ask: "Who can call which MCP tools, and how do you enforce it?"

Required:

Per-call authorisation, not standing grants. See MCP access control lists.
Workload identity for every MCP server (not shared tokens).
Quarterly access reviews including agent service accounts.

Evidence: policy decision logs, identity issuance logs, access review attestations.

CC6.6 — identification of unauthorised access

For MCP: tool calls outside expected scope, denied authorisations, anomalous tool sequences.

Required:

Real-time alerting on policy denies.
Trace storage for at least the audit window (12 months for Type II).
Documented response to detected anomalies.

Evidence: alerting configuration, sample incident tickets, audit trails.

CC6.7 — transmission and disposal

MCP uses JSON-RPC over stdio (local) or HTTP/SSE (remote). Both need encryption in transit when crossing trust boundaries.

Required:

mTLS for any MCP server reachable over the network.
Documented data-disposal policy for trace stores (with PII-aware redaction).

Evidence: TLS configuration, retention/disposal policy, sample disposal logs.

CC7.2 — system monitoring

The agent itself is part of the system. Monitoring the LLM is now in scope.

Required:

Real-time metrics covering tool calls, error rates, latency. See real-time agent monitoring.
Alerting wired to on-call.
Documented runbooks per alert.

Evidence: monitoring dashboards, alert policies, runbook docs.

CC8.1 — change management

When does a model upgrade or a new prompt count as a "change"? Auditors want it to count.

Required:

Pinned model versions in production (no auto-bumps).
PR review for prompt changes, with the same rigour as code.
Regression suite as part of the change pipeline (see continuous agent regression testing).

Evidence: deployment changelog, PR reviews, regression results per release.

PI1.4 — processing integrity

For MCP: was the right tool called with the right arguments, did the result correctly reach the user?

Required:

Trace for every user-visible task, end to end.
Evidence of validation (schema checks, refusal rates within bounds).
Documented error-handling and retry semantics — see distributed agent failure recovery.

Evidence: trace samples, validation logs, error-classification reports.

Evidence collection — automate from day one

The single biggest difference between a painful audit and an easy one: evidence is automated, not screenshot-based.

Build these collectors as part of your platform, not at audit time:

Authorisation log archive. Every allow/deny decision, signed and immutable, retained for 12 months.
Trace archive. Sampled (or full, if storage allows) traces per task, retained for 12 months.
Model + prompt version manifest. Every deployment writes the active model IDs and prompt hashes to a versioned manifest. Auditors love this.
Access review automation. A nightly job that lists all human and service principals with their grants, dumps to S3 with a hash, and emails the security team for review.

For each, document:

What system writes it.
Where it is stored.
Retention and disposal.
Who can access it (and the access log for that).

Vendor and sub-processor management

Every external MCP server is a sub-processor in SOC 2 terms. Auditors will ask:

Is there a list?
Does it include data flows?
Is each item under contract or covered by acceptable terms?
How do you know they are still trustworthy?

Required:

Sub-processor registry covering every MCP server (including OSS ones used in prod).
DPA (or equivalent) with each commercial vendor.
Periodic re-review tied to the registry trust criteria — see trusted MCP registry providers.

Common gaps auditors find

The first SOC 2 audit of an MCP-heavy stack typically surfaces:

Shared service-account tokens. Replace with workload identity before the audit, not during.
Auto-pulled latest tags. Pin every dependency by digest.
Trace store with raw PII. Redact at write time; the trace store is in scope.
No regression suite. "We test in prod" is not an answer.
Undocumented break-glass paths. Document them, audit access, time-limit them.

Fixing each post-audit costs 5–10× more than building it in.

A 90-day prep plan

If you are 90 days from your first audit window:

Days	Focus
0–30	Inventory every MCP server. Write sub-processor registry. Pin all dependencies.
30–60	Implement per-call authorisation + audit log. Stand up regression suite.
60–75	Wire monitoring + alerting + runbooks.
75–90	Dry-run with internal "auditor" reviewing the evidence trail. Fix gaps.

If you slip on any of these, slip the audit window. A failed Type II audit is worse than a delayed one.

Where this is heading

AICPA is drafting AI-specific guidance that will likely formalise much of the above. EU equivalents (under the AI Act) are similar in shape — see EU AI Act MCP compliance. Build now to a strict interpretation; the formal guidance will codify what good teams already do.

What SOC 2 cares about, applied to MCP

The control mapping

CC6.1 — logical access

CC6.6 — identification of unauthorised access

CC6.7 — transmission and disposal

CC7.2 — system monitoring

CC8.1 — change management

PI1.4 — processing integrity

Evidence collection — automate from day one

Vendor and sub-processor management

Common gaps auditors find

A 90-day prep plan

Where this is heading

Related reading