Agent behavior anomaly detection: catching the compromised agent before it acts

Classical SIEM looks at network and process events. Agents need a new layer that looks at decisions and actions. Six signals catch most cases of "this agent is suddenly behaving differently". Here is what they detect, how to compute them, and how to wire alerts.

Why agent anomalies need their own category

Three properties make agent anomalies distinct:

Decisions vary by design — variance is normal; the question is when variance becomes outlier.
Outputs are unstructured — classical signature matching does not work.
Tools amplify — a small behavioural shift can cascade into a large blast.

The signals below are agent-specific. Wire them into the existing SIEM, not as a separate stack.

The six signals

1. Tool-call distribution shift

Baseline: which tools each agent uses, with what frequency. Alert: distribution drifts beyond a threshold (KL divergence works).

Catches: jailbreaks that redirect the agent toward unusual tools.
False positives: legitimate feature changes; mitigate by re-baselining on rollouts.

2. Prompt entropy

Baseline: distribution of incoming prompt characteristics (length, special tokens, language). Alert: outliers, particularly of probing-style prompts.

Catches: active jailbreak attempts, prompt-injection campaigns.
False positives: new user cohorts; segment by population.

3. Latency outliers

Baseline: per-tool latency distribution. Alert: persistent outliers in the slow tail.

Catches: server compromise that causes side effects, exfil through slow channels.
False positives: infrastructure incidents; correlate with deployments.

4. Output similarity collapse

Measure the diversity of the agent's outputs over time. Alert: sudden drop in diversity.

Catches: an attacker pushing the agent into a fixed-template response.
False positives: a prompt change made the agent more deterministic.

5. Scope drift

Track which scopes each agent actually uses. Alert when a scope is used that the agent has never touched before.

Catches: capability misuse; jailbreaks that unlock unused tools.
False positives: legitimate new feature; whitelist on rollout.

6. Identity-vs-action mismatch

Cross-reference the calling user with the agent's action. Alert: a junior account triggering admin-class actions.

Catches: stolen sessions; lateral movement.
False positives: delegated workflows; document those explicitly.

Implementation pattern

Stream all agent telemetry into a feature store. Compute the six signals on rolling windows. Three alert tiers:

WARN: signal exceeds 2σ baseline
  → notification to agent operator

ALERT: signal exceeds 4σ
  → page on-call

CONTAIN: signal exceeds 6σ AND scope drift
  → automatic containment (pause agent, require human re-enable)

Containment is the most impactful change you can make. The cost of a false positive (a paused agent for 10 minutes) is much lower than the cost of a real attack proceeding.

Baselines

Two baseline patterns:

Per-agent rolling baseline — each agent has its own normal; sensitive to per-agent changes.
Cohort baseline — group similar agents; less sensitive but catches cohort-wide attacks.

Use both. Cohort catches campaign attacks; per-agent catches targeted ones.

Storage shape

Wide event table:

CREATE TABLE agent_events (
  ts timestamptz not null default now(),
  agent_id text not null,
  user_id text,
  event_type text not null,
  tool_name text,
  scope text,
  prompt_hash text,
  latency_ms int,
  output_hash text,
  metadata jsonb
);
CREATE INDEX ON agent_events (agent_id, ts);
CREATE INDEX ON agent_events (user_id, ts);

Feed from the audit trail. Compute signals from this table.

Triage workflow

When an alert fires:

Snapshot the agent's recent traces — see trace visualization tools.
Compare against last week's baseline.
Check for prompt-injection patterns in the input — see prompt injection guide.
Decide: false positive, feature change, or actual incident.
If actual: contain, rotate credentials, post-mortem.

What anomaly detection does not catch

Slow-burn attacks — small drift over weeks.
First-time attacks — no baseline yet.
Inside-the-baseline attacks — attacker who knows the threshold.

Combine anomaly detection with hallucination detection, DLP, and gateway-level scope enforcement.

Common mistakes

One global threshold — agents vary too much; per-agent baselines.
No containment automation — alerts go to a tab nobody watches.
Alerting only on the worst signal — combinations matter; correlate across signals.
Skipping the post-mortem — attackers learn; defenders must too.

Where this is heading

Three trends by 2027: agent-aware SIEM products natively ingesting MCP traces, shared anomaly threat intel feeds across organisations, and standardised "agent suspicious activity" tags in the MCP audit shape. Build the basic signals now.