Every agent in production needs an audit trail — SOX, SOC2, EU AI Act and most sectoral regulators now require it. This guide covers what fields the regulator expects, retention windows, access controls, and a working schema you can adopt today.
What counts as an audit event
Six events per agent turn should end up in the trail:
- User request received.
- Model call issued (with prompt hash, model, parameters).
- Tool call requested by the model.
- Tool result returned.
- Final response returned to the user.
- Any errors.
The goal: given an outcome weeks later, you can reconstruct exactly what the agent saw and what it decided.
Required fields
| Field | Why it matters |
|---|---|
| ts | Ordering, latency analysis. |
| actor_id | Which user triggered the event. |
| agent_id | Which agent instance. |
| event_type | Enum: request, model_call, tool_call, tool_result, response, error. |
| prompt_hash | Prompt integrity without storing content. |
| inputs | Arguments or prompt, redacted per policy. |
| outputs | Result, redacted. |
| decision | For side-effecting events: who approved. |
| tokens / cost | For capacity planning and disputes. |
| prev_hash | Chain of custody — enables tamper detection. |
The schema
CREATE TABLE agent_audit (
id bigserial primary key,
ts timestamptz not null default now(),
actor_id text not null,
agent_id text not null,
session_id text not null,
event_type text not null,
prompt_hash text,
inputs jsonb,
outputs jsonb,
decision jsonb,
tokens int,
cost_usd numeric(10,6),
prev_hash bytea,
hash bytea not null
);
CREATE INDEX ON agent_audit (actor_id, ts);
CREATE INDEX ON agent_audit (session_id);
Retention windows
| Regulation | Minimum retention |
|---|---|
| SOX | 7 years for financial-control-relevant events. |
| SOC2 | 12 months commonly, 24 for TSP 1.4. |
| EU AI Act (high-risk) | 10 years post-market. |
| HIPAA | 6 years for PHI-relevant events. |
| GDPR | As long as lawful basis holds; typical 12-24 months for audit. |
Pick the maximum applicable to your processing.
Access control
Three roles with distinct reads:
- Ops — can read but not delete; used for debugging and investigations.
- Compliance — can read and export; used for audits.
- Data subject — can request an export of their own rows via a dedicated endpoint.
No role can delete or edit without a formal legal hold workflow. Enforce with DB permissions; never rely on application code alone.
Immutability via hash chaining
Each row stores prev_hash, a hash of the previous row, and hash, a hash over its own content plus prev_hash. A daily job publishes the most recent hash to a tamper-evident log (internal trust store or a public append-only log). Any edit breaks the chain.
hash = sha256(
prev_hash || ts || actor_id || agent_id || event_type ||
jsonb_canonical(inputs) || jsonb_canonical(outputs) || decision
)
Querying for investigations
The three most common queries:
- "Show everything agent X did for user Y between dates" —
actor_id + session_id + ts range. - "Show every side-effect event that was not approved" —
event_type = 'tool_call' AND decision IS NULL AND tool IN side_effect_set. - "Show error bursts" —
event_type = 'error'bucketed by minute.
Redaction policy
You cannot log raw PII indefinitely. Apply a redaction policy at write time:
- Mask email addresses, phone numbers, credit cards in
inputs/outputs. - Store the original hash separately to answer integrity questions without exposing content.
- Keep a small redaction-allowed copy for the DPO-authorised investigation flow.
Common mistakes
- Logging prompts verbatim (bloat, PII exposure).
- Forgetting to log tool results (you cannot reconstruct the decision).
- Using app-level delete for retention (always cascade from DB-level policy).
- No chain (auditor assumes you tampered; prove you did not).