Anti-fraud teams have spent a decade building rules engines and gradient-boosted models. In 2026, agents are eating both. Not because they are smarter at scoring, but because they can investigate — query systems, correlate signals, and explain a decision. Here is the architecture and the trade-offs.
What changed
Classic fraud stacks have two modes:
- Rules — fast, deterministic, easy to game.
- ML scores — adaptive, harder to game, opaque.
Agents add a third:
- Investigation — given a suspicious signal, fetch context, correlate, decide, explain.
Investigation used to be the human analyst's job. Agents now do the first pass on most low-and-mid risk signals.
The architecture
transaction
↓
fast scorer (existing ML)
↓
score in suspicious band? → agent investigates
↓
agent: fetch user history, device signals, related accounts, similar past cases
↓
agent decision: clear / hold / escalate to human
↓
human reviews escalations only
The fast scorer stays. The agent layer triages between auto-clear and human review. Result: 60–80% reduction in human review queues, with comparable false-negative rates.
Tools the agent needs
Read scopes for:
- Customer profile and tenure.
- Device fingerprint history.
- Related accounts (shared device, IP, payment method).
- Recent transaction history.
- Past dispute outcomes.
- Sanctions / watch lists.
Write scopes only for the action the agent is authorised to take (hold, request step-up auth, escalate). Side-effect actions go through the consent layer — except in sub-second decisioning where the policy must pre-approve.
What agents are better at
Three categories:
- Soft signals correlation — combining weak signals across systems faster than humans.
- Explanation — every decision comes with reasoning, citing the data it pulled. See decision explainability.
- Long-tail patterns — rare patterns the rules engine never had time to encode.
What humans are still better at: novel scams, social engineering, judgement calls in low-data cases.
Where the new false positives come from
Three patterns that surprised teams:
- Hallucinated context — the agent recalled a "related dispute" that never existed.
- Stale data — the agent decided on yesterday's data because the freshness check was missing.
- Confident-but-wrong reasoning — the agent's chain of thought looked plausible; the conclusion was not.
All three caught by hallucination detection tests and citation requirements. Without those, agent fraud decisions are not auditable.
Latency budget
Real-time fraud decisions need < 200 ms end-to-end. That excludes any LLM call. Pattern that fits:
- Tier 0: rules engine + fast ML score, < 50 ms. Auto-clear or auto-hold.
- Tier 1: suspicious band → agent runs async after the transaction is held; decisioning in 5–30 seconds.
- Tier 2: complex cases → human queue.
Agents do not replace real-time. They replace tier-2 triage.
Model choice
For fraud, model choice matters:
- Sonnet for the typical investigation. Right balance of cost and reasoning.
- Haiku for tier-1 triage where confidence is high.
- Opus only for genuinely novel cases that the team escalated up.
See model routing.
Audit and compliance
Fraud decisions are heavily regulated. Every agent decision logs:
- Inputs the agent saw (with timestamps).
- Tools called (with arguments hashes).
- Decision and confidence.
- Reasoning chain (for human review of escalated cases).
- Outcome (cleared, held, escalated).
Feeds the audit trail. Auditors should be able to reconstruct any decision in full.
Adversarial concerns
Fraudsters adapt. Three risks specific to agent-based fraud:
- Prompt injection in user-controlled fields (memo, name) — see prompt injection guide.
- Pattern probing — fraudsters submit borderline transactions to learn the threshold.
- Tool flood — flooding the system to slow agents and create gaps.
Mitigations: strict input sanitisation, decision randomisation across sessions, rate-limiting per actor.
Quality metrics
Beyond the standard fraud metrics:
- Agent-cleared chargeback rate — false negatives the agent let through.
- Agent-held false positive rate — good transactions the agent held.
- Human override rate — escalations the human flipped.
- Decision explainability completeness — % of decisions with full citation chains.
Surface in error rate dashboards.
Common mistakes
- No human override — fraudsters learn; the agent must be correctable in flight.
- Putting the agent in real-time — latency does not allow it; use tier-1 triage.
- No citation requirements — agent makes confident claims with no evidence trail.
- Single-model decisioning — Haiku alone misses subtle cases; Opus alone is too expensive.
Where this is heading
Three trends by 2027: regulator-blessed agent decision frameworks for financial fraud (FFIEC, FCA both consulting), shared fraud-pattern feeds for agents, and consortia models specialised on fraud across banks. The bar for agent-based fraud detection rises every quarter.