AI agent audit trails: what to log, how long to keep, who can read

Every agent in production needs an audit trail — SOX, SOC2, EU AI Act and most sectoral regulators now require it. This guide covers what fields the regulator expects, retention windows, access controls, and a working schema you can adopt today.

What counts as an audit event

Six events per agent turn should end up in the trail:

User request received.
Model call issued (with prompt hash, model, parameters).
Tool call requested by the model.
Tool result returned.
Final response returned to the user.
Any errors.

The goal: given an outcome weeks later, you can reconstruct exactly what the agent saw and what it decided.

Required fields

Field	Why it matters
ts	Ordering, latency analysis.
actor_id	Which user triggered the event.
agent_id	Which agent instance.
event_type	Enum: request, model_call, tool_call, tool_result, response, error.
prompt_hash	Prompt integrity without storing content.
inputs	Arguments or prompt, redacted per policy.
outputs	Result, redacted.
decision	For side-effecting events: who approved.
tokens / cost	For capacity planning and disputes.
prev_hash	Chain of custody — enables tamper detection.

The schema

CREATE TABLE agent_audit (
  id bigserial primary key,
  ts timestamptz not null default now(),
  actor_id text not null,
  agent_id text not null,
  session_id text not null,
  event_type text not null,
  prompt_hash text,
  inputs jsonb,
  outputs jsonb,
  decision jsonb,
  tokens int,
  cost_usd numeric(10,6),
  prev_hash bytea,
  hash bytea not null
);
CREATE INDEX ON agent_audit (actor_id, ts);
CREATE INDEX ON agent_audit (session_id);

Retention windows

Regulation	Minimum retention
SOX	7 years for financial-control-relevant events.
SOC2	12 months commonly, 24 for TSP 1.4.
EU AI Act (high-risk)	10 years post-market.
HIPAA	6 years for PHI-relevant events.
GDPR	As long as lawful basis holds; typical 12-24 months for audit.

Pick the maximum applicable to your processing.

Access control

Three roles with distinct reads:

Ops — can read but not delete; used for debugging and investigations.
Compliance — can read and export; used for audits.
Data subject — can request an export of their own rows via a dedicated endpoint.

No role can delete or edit without a formal legal hold workflow. Enforce with DB permissions; never rely on application code alone.

Immutability via hash chaining

Each row stores prev_hash, a hash of the previous row, and hash, a hash over its own content plus prev_hash. A daily job publishes the most recent hash to a tamper-evident log (internal trust store or a public append-only log). Any edit breaks the chain.

hash = sha256(
  prev_hash || ts || actor_id || agent_id || event_type ||
  jsonb_canonical(inputs) || jsonb_canonical(outputs) || decision
)

Querying for investigations

The three most common queries:

"Show everything agent X did for user Y between dates" — actor_id + session_id + ts range.
"Show every side-effect event that was not approved" — event_type = 'tool_call' AND decision IS NULL AND tool IN side_effect_set.
"Show error bursts" — event_type = 'error' bucketed by minute.

Redaction policy

You cannot log raw PII indefinitely. Apply a redaction policy at write time:

Mask email addresses, phone numbers, credit cards in inputs/outputs.
Store the original hash separately to answer integrity questions without exposing content.
Keep a small redaction-allowed copy for the DPO-authorised investigation flow.

Common mistakes

Logging prompts verbatim (bloat, PII exposure).
Forgetting to log tool results (you cannot reconstruct the decision).
Using app-level delete for retention (always cascade from DB-level policy).
No chain (auditor assumes you tampered; prove you did not).