Spinning up five agents and averaging their answers does not make them collectively smart. It makes them collectively expensive. Real swarm coordination is structured. Here are the five strategies that work, and how to pick.
When swarms make sense
A swarm is worth it when:
- One model run has unacceptable variance.
- The task has a verifiable answer (so you can pick best of N).
- The cost of being wrong is high.
- The cost of N runs is low relative to the upside.
For chat-style use cases, a single well-prompted Opus usually wins. Swarms shine on planning, code generation, and research synthesis.
The five strategies
1. Vote
N agents independently produce an answer. Aggregator picks the most common (discrete) or the centroid (continuous).
- Strengths: trivial; embarrassingly parallel.
- Weaknesses: fails on creative tasks where the right answer is unique.
- Pick when: classification, structured extraction, "yes/no" judgement.
2. Debate
Two or three agents argue. Each round, they see the others' positions and refine. A judge picks the winner after K rounds.
- Strengths: surfaces edge cases; fights confident-but-wrong outputs.
- Weaknesses: expensive; risk of mutual reinforcement of wrong answers.
- Pick when: open-ended judgement, code review, design decisions.
3. Delegate (supervisor + workers)
A planner decomposes the task; specialised workers execute pieces; the planner synthesises. See orchestration patterns.
- Strengths: matches the structure of complex tasks.
- Weaknesses: planner is a bottleneck and a single point of failure.
- Pick when: the task naturally decomposes.
4. Blackboard
Agents post claims and evidence to a shared store. Each iteration, agents read the blackboard and add what they can. Termination by an explicit "fullness" metric.
- Strengths: asynchronous; resilient to slow agents.
- Weaknesses: non-termination risk; debugging is hard.
- Pick when: distributed research, monitoring fleets, knowledge harvesting.
5. Market
Agents bid on tasks; each task goes to the lowest-bidder-that-meets-quality. Rewards efficient agents; weeds out the slow ones.
- Strengths: self-tunes resource use.
- Weaknesses: complex; hard to align bid prices to real cost.
- Pick when: large heterogeneous fleets where efficiency matters.
Picking by task type
| Task | Strategy |
|---|---|
| Structured extraction from a PDF | Vote |
| Architectural design review | Debate |
| Multi-step research project | Delegate |
| Continuous monitoring of a domain | Blackboard |
| Internal compute-bound workload | Market |
Diversity is the secret ingredient
A swarm of identical agents is a single agent paying N times. To get the benefit, vary:
- Temperature across instances.
- Prompts with different framings of the same task.
- Models (Opus + Sonnet + Haiku — different strengths emerge).
- Tool sets for delegate-style swarms.
Without diversity, you get diversity-collapse: every instance lands on the same wrong answer.
Termination conditions
Three patterns, used together:
- Step budget — hard cap per agent.
- Quality threshold — stop when the aggregated answer beats a confidence floor.
- Diminishing returns — stop when the last K iterations did not improve the metric.
The first prevents the worst-case bill; the others stop early when possible.
Cost vs. quality
A 5-agent swarm typically costs 4–6x a single Opus call. Quality lift varies:
| Task | Lift over single Opus |
|---|---|
| Structured extraction | 5–15% |
| Code generation (correctness) | 10–25% |
| Open-ended writing | < 5% |
| Math/logic puzzles | 15–40% |
Open-ended writing benefits least; verifiable tasks benefit most. Pick swarms for the latter.
Where this is heading
Two shifts to expect: native swarm primitives in the Claude Agent SDK (declare a strategy, the SDK runs it), and consensus algorithms tuned for LLM uncertainty (probabilistic voting, calibrated debate). Build the strategy choice into your design now.