Agent-based fraud detection: tool-using agents are quietly eating the anti-fraud stack

Anti-fraud teams have spent a decade building rules engines and gradient-boosted models. In 2026, agents are eating both. Not because they are smarter at scoring, but because they can investigate — query systems, correlate signals, and explain a decision. Here is the architecture and the trade-offs.

What changed

Classic fraud stacks have two modes:

Rules — fast, deterministic, easy to game.
ML scores — adaptive, harder to game, opaque.

Agents add a third:

Investigation — given a suspicious signal, fetch context, correlate, decide, explain.

Investigation used to be the human analyst's job. Agents now do the first pass on most low-and-mid risk signals.

The architecture

transaction
   ↓
fast scorer (existing ML)
   ↓
score in suspicious band? → agent investigates
   ↓
agent: fetch user history, device signals, related accounts, similar past cases
   ↓
agent decision: clear / hold / escalate to human
   ↓
human reviews escalations only

The fast scorer stays. The agent layer triages between auto-clear and human review. Result: 60–80% reduction in human review queues, with comparable false-negative rates.

Tools the agent needs

Read scopes for:

Customer profile and tenure.
Device fingerprint history.
Related accounts (shared device, IP, payment method).
Recent transaction history.
Past dispute outcomes.
Sanctions / watch lists.

Write scopes only for the action the agent is authorised to take (hold, request step-up auth, escalate). Side-effect actions go through the consent layer — except in sub-second decisioning where the policy must pre-approve.

What agents are better at

Three categories:

Soft signals correlation — combining weak signals across systems faster than humans.
Explanation — every decision comes with reasoning, citing the data it pulled. See decision explainability.
Long-tail patterns — rare patterns the rules engine never had time to encode.

What humans are still better at: novel scams, social engineering, judgement calls in low-data cases.

Where the new false positives come from

Three patterns that surprised teams:

Hallucinated context — the agent recalled a "related dispute" that never existed.
Stale data — the agent decided on yesterday's data because the freshness check was missing.
Confident-but-wrong reasoning — the agent's chain of thought looked plausible; the conclusion was not.

All three caught by hallucination detection tests and citation requirements. Without those, agent fraud decisions are not auditable.

Latency budget

Real-time fraud decisions need < 200 ms end-to-end. That excludes any LLM call. Pattern that fits:

Tier 0: rules engine + fast ML score, < 50 ms. Auto-clear or auto-hold.
Tier 1: suspicious band → agent runs async after the transaction is held; decisioning in 5–30 seconds.
Tier 2: complex cases → human queue.

Agents do not replace real-time. They replace tier-2 triage.

Model choice

For fraud, model choice matters:

Sonnet for the typical investigation. Right balance of cost and reasoning.
Haiku for tier-1 triage where confidence is high.
Opus only for genuinely novel cases that the team escalated up.

See model routing.

Audit and compliance

Fraud decisions are heavily regulated. Every agent decision logs:

Inputs the agent saw (with timestamps).
Tools called (with arguments hashes).
Decision and confidence.
Reasoning chain (for human review of escalated cases).
Outcome (cleared, held, escalated).

Feeds the audit trail. Auditors should be able to reconstruct any decision in full.

Adversarial concerns

Fraudsters adapt. Three risks specific to agent-based fraud:

Prompt injection in user-controlled fields (memo, name) — see prompt injection guide.
Pattern probing — fraudsters submit borderline transactions to learn the threshold.
Tool flood — flooding the system to slow agents and create gaps.

Mitigations: strict input sanitisation, decision randomisation across sessions, rate-limiting per actor.

Quality metrics

Beyond the standard fraud metrics:

Agent-cleared chargeback rate — false negatives the agent let through.
Agent-held false positive rate — good transactions the agent held.
Human override rate — escalations the human flipped.
Decision explainability completeness — % of decisions with full citation chains.

Surface in error rate dashboards.

Common mistakes

No human override — fraudsters learn; the agent must be correctable in flight.
Putting the agent in real-time — latency does not allow it; use tier-1 triage.
No citation requirements — agent makes confident claims with no evidence trail.
Single-model decisioning — Haiku alone misses subtle cases; Opus alone is too expensive.

Where this is heading

Three trends by 2027: regulator-blessed agent decision frameworks for financial fraud (FFIEC, FCA both consulting), shared fraud-pattern feeds for agents, and consortia models specialised on fraud across banks. The bar for agent-based fraud detection rises every quarter.