Air-gapped agent deployment: running an agent stack

"On-prem" usually means "in our data centre with internet". Air-gapped means "no internet at all". Defence, intelligence, and a slice of finance and pharma require it. The architecture is different in important ways. Here is the working stack and the operational playbook.

What air-gap really means

A true air-gap deployment has:

No outbound internet from the agent network. Not even DNS to public resolvers.
No inbound internet that reaches the agent network.
Physical separation or strict enclave gateways with one-way data diodes.

If your "air-gap" allows a managed update channel from the vendor, you are on-prem with strict egress, not air-gapped. Both are valid; do not confuse them.

Architecture

┌─────────────────────────────────────────────────────────┐
│ Air-gapped enclave                                      │
│                                                         │
│  agent host → MCP gateway → MCP servers                 │
│       ↓             ↓             ↓                     │
│  inference cluster  registry  internal data             │
│       ↑             ↑             ↑                     │
│  weight mirror   artefact store  observability          │
└─────────────────────────────────────────────────────────┘
                           ↑
                   one-way data diode
                           ↑
              staging zone (internet-connected)

Everything inside the enclave runs without internet. The staging zone outside ingests vendor artefacts (model weights, MCP server packages, updates) and pushes them through a one-way diode after verification.

Five layers, all inside

1. Model inference

Vendor on-prem licence (Anthropic, Llama-derived, Mistral). Weights shipped on physical media or via the diode. Local model registry tracks versions.

2. MCP infrastructure

Internal registry, gateway, and servers — all artefacts mirrored from the staging zone.

3. Memory and storage

Standard Postgres + pgvector. Backups stay inside the enclave.

4. Observability

Self-hosted Langfuse / Phoenix. Telemetry never leaves the enclave.

5. Identity

Internal IdP. No SaaS dependency. Federation only via diode if absolutely required.

The update problem

Your single hardest operational problem is keeping software current without internet:

Model upgrades — every 6 months, vendor ships new weights via approved channel; staging zone verifies; diode push.
MCP server updates — internal registry pulls from a vendor mirror in staging; signed, hashed, scanned, then diode-pushed.
CVE patching — separate channel for OS-level CVEs; same diode pattern.

Plan for an update lag of 14–30 days behind the public ecosystem. Faster is theoretically possible, more expensive in audit.

What you give up

Three capabilities are off the table:

Web search tools — no agent that browses the public web.
Cloud SaaS integrations — no Slack, no GitHub.com, no Stripe (unless you have private endpoints).
Real-time threat intel — security signals lag.

Plan workflows around these absences from day one.

Operational playbook

Three rotations to staff:

Inference cluster ops — keep the GPUs healthy.
Update pipeline — own the staging zone and diode pushes.
Incident response — air-gap means you cannot ssh in from home.

Plus a quarterly drill for model rollback (when an update degrades quality).

Compliance benefits

Air-gap satisfies the strictest interpretations of:

US ITAR / EAR for defence-relevant data.
Classified information handling (Secret, Top Secret depending on accreditation).
Some interpretations of "no third-party processing" in EU sovereignty rules.

What it does not solve:

Internal threats — air-gap does not help against an insider.
Bad data hygiene — PII still needs the same handling inside.
Stale models — quality drift if the upgrade pipeline lags too far.

Cost model

Roughly 1.5–2x on-prem cost of the same capability:

Capex higher because you need redundant everything inside the enclave.
Opex higher because of the staging-zone team and diode operations.
Capability gap — some workflows are simply not possible.

Worth it where the regulator or the contract leaves no other option. Not worth it as a "more secure" preference.

Common mistakes

Confusing on-prem with air-gap — different architectures, different costs.
One-way that is not one-way — TCP-based "diodes" that allow handshakes are not diodes.
No update plan — air-gap + no updates = obsolete agent in 18 months.
Skipping observability — running blind under air-gap is exponentially harder than under SaaS.

Where this is heading

Two trends by 2027: turnkey air-gapped agent appliances from defence vendors, and standardised diode protocols for AI weight transfer. Until then, build with the architecture above and budget for the operational reality.