The hierarchical agent supervisor pattern: why tree-shaped agents beat flat swarms

Three agents talking to each other works. Seven agents talking to each other is a committee meeting that never ends. The supervisor pattern is what replaces the committee — and by 2027 it will be the default shape of multi-agent systems.

The problem flat swarms have

Most early multi-agent demos use a flat topology: every agent can call every other agent. It demos beautifully at five agents and collapses at fifteen. Three failure modes dominate:

Coordination chatter. Agents copy each other, duplicate work, and argue about who owns a task. Token cost explodes.
Context dilution. Each agent sees fragments; none holds the full picture.
No one to blame. When the system produces a wrong answer, there is no single agent you can point to and fix.

Hierarchies fix all three by centralising coordination. A supervisor agent owns the plan, delegates to specialist worker agents, and aggregates results. Workers do not talk to each other directly.

The supervisor's job

A good supervisor agent does five things, and nothing else:

Decomposes the user's request into sub-tasks.
Routes each sub-task to the right specialist (classifier, retriever, coder, reviewer).
Tracks partial results in working memory.
Recovers from worker failure — retry, reroute, or ask the user.
Synthesises the final answer.

The supervisor does not do the work. The moment it starts writing code or running queries itself, it stops being a supervisor and becomes just another generalist agent — and the pattern breaks.

Reference topology

                ┌──────────────┐
                │  Supervisor  │
                └──────┬───────┘
          ┌────────────┼────────────┐
          ▼            ▼            ▼
   ┌──────────┐ ┌──────────┐ ┌──────────┐
   │ Retriever│ │  Coder   │ │ Reviewer │
   └──────────┘ └──────────┘ └──────────┘
          │            │            │
          ▼            ▼            ▼
       [Index]     [Files/MCP]   [Tests]

Each worker is narrow: it has a small prompt, a small tool-set, and no awareness of siblings. The supervisor is the only agent that sees the whole graph.

How LangGraph and AutoGen implement it

LangGraph models the pattern as a state graph. The supervisor is a node with conditional edges to worker nodes. A "next" field in the shared state determines routing.

from langgraph.graph import StateGraph, END

def supervisor(state):
    # LLM picks the next worker or END
    return {"next": choose_worker(state)}

graph = StateGraph(AgentState)
graph.add_node("supervisor", supervisor)
graph.add_node("retriever", retriever_node)
graph.add_node("coder", coder_node)
graph.add_conditional_edges(
    "supervisor",
    lambda s: s["next"],
    {"retriever": "retriever", "coder": "coder", "END": END},
)
graph.add_edge("retriever", "supervisor")
graph.add_edge("coder", "supervisor")

Workers always return to the supervisor. The supervisor re-evaluates and either dispatches again or terminates.

AutoGen expresses the pattern through GroupChat with a GroupChatManager. The manager agent orders speakers based on the conversation state. Semantically identical, different API surface.

When hierarchies are the wrong shape

The supervisor pattern is not universal. It is wrong when:

Sub-tasks are genuinely peer-to-peer. Two negotiating agents (buyer/seller) should talk directly — a supervisor in the middle adds cost and latency for no decision value.
Latency is critical. Every round trip through the supervisor adds 1–3 seconds. Real-time voice or gaming needs a flatter topology.
You need emergent behaviour. Research on swarm coordination deliberately avoids hierarchy. If you are studying that, do not skip it.

For everything else — production agents shipping real work — hierarchy is the right default.

Pitfalls to avoid

Supervisor bloat. Give the supervisor more than one model class of work and its prompt becomes unmanageable. Keep it to planning and routing.
Deep trees. Two levels work. Three levels are manageable. Four and beyond, you lose observability. If you need depth, split into two two-level systems with a clear API between them.
Context overflow. The supervisor sees everything, so it is always the first context-window casualty. Summarise worker outputs before the supervisor stores them; see persistent agent memory architecture for techniques.
Silent failure propagation. Workers must return structured error states, not free-text "I couldn't figure it out." The supervisor needs to branch on failures.

Observability and debugging

Hierarchies make debugging tractable because every call has a parent. Good traces show:

The supervisor's decision at each step (which worker, why).
The worker's input and output.
The supervisor's re-plan after each worker returns.

This maps naturally to the spans in modern agent observability tooling — see ai agent observability platform and agent trace visualization tools.

Where this is heading

By 2027 expect supervisor-first frameworks to ship as defaults: LangGraph's create_supervisor, AutoGen's SelectorGroupChat, Claude Agent SDK's subagents. Flat swarms will remain a research topic, not a production pattern. If you are picking an architecture this year, start with hierarchy and only flatten when you have a specific reason.