The 2024 cohort of AI pendants and 2025 wave of AI glasses settled into a small set of working patterns. The hardware that survived shares an agent architecture: capture cheaply, triage locally, hand off to cloud sparingly, surface results without dominating the wearer's attention or violating those around them.
The wearable form factors
Four classes shipping in 2026:
- Smart earbuds — voice-first, no display. Ambient audio capture optional.
- AI pendants and pins — voice + ambient capture, tiny optional screen.
- AI glasses — voice + camera + small display + bone-conduction audio.
- Smart watches with agent layer — established form factor, deeper agent integration.
Architecture is broadly similar across all four; UX implications differ sharply.
The capture problem
Wearables capture more context than other devices: what the wearer hears, sees, says. Three rules that are now non-negotiable:
- Hardware mute — physical control, visible to others.
- Capture indicator — LED or screen showing recording state, visible from outside the device.
- Local processing default — capture stays local until the wearer triggers cloud action.
Companies that violated these in 2024–2025 face active litigation. Do not be among them.
Agent architecture for a wearable
Five layers:
hardware capture (mic, camera, IMU)
↓
on-device wake / trigger (always on, model in DSP)
↓
on-device intent classification (small model)
↓ if cloud needed
companion device or direct connection
↓
cloud agent (voice + tools + memory)
↓
return: speech, glanceable display, haptic
The first two layers run continuously; the rest activate only on trigger. Battery and privacy both depend on this gate.
Output modalities
Three channels, often combined:
- Bone-conduction audio — private, hands-free, but slow for long answers.
- Glanceable display — quick recall, recipe step, navigation cue.
- Haptic — confirmation, alerts.
Pick the lightest channel that conveys the answer. Long monologues fail; quick glances succeed.
What works on wearables
Five use cases that survived early experiments:
Hands-busy translation
Conversation in a foreign language; agent translates in near-real-time. Glasses excel here.
Captioning
Real-time transcription for hard-of-hearing wearers. Strong adoption.
Capture and recall
"Remind me what they said about pricing" — capture, on-device transcription, retrievable later.
Navigation overlays
Glasses with route arrows; light cognitive load.
Quick lookups
"What is this plant?" via camera → identification → audio answer.
What does not
- Long form Q&A — the hardware is not a phone; users put down the device.
- Heavy media consumption — battery and weight kill it.
- Continuous always-on transcription — privacy plus battery; opt-in by session.
Privacy as a product feature
Three commitments that build trust:
- Companion privacy mode — wearer can mute capture for everyone in earshot for N minutes.
- Recording badge — visible when capturing; wearer cannot disable in social mode.
- Local-first storage — captured content stays on device unless explicitly synced.
Products that built these into the hardware are the ones still shipping in 2026.
Cost and latency budget
| Layer | Latency target | Energy |
|---|---|---|
| Wake detection | < 50 ms | very low |
| Local intent | < 200 ms | low |
| Companion phone hop | + 30 ms | medium |
| Cloud round-trip | + 300–600 ms | medium-high |
| TTS streaming | + 200 ms | low |
Total interactive budget under 1.5 s end-to-end. Past that, the wearer reaches for their phone instead.
Companion-app pattern
Most wearables pair with a phone. The phone hosts:
- The agent loop (heavy model interactions).
- Memory and history.
- Settings and consent management.
- Sync with cloud when appropriate.
The wearable hosts:
- Capture and on-device triage.
- Quick responses (cached, lightweight).
- Display / haptic output.
Mobile MCP patterns apply directly — see mobile MCP client implementation.
Common mistakes
- Continuous capture without indicator — legal and social disaster.
- No local fallback — wearables go offline; agents must too. See offline-capable agents.
- Heavy UI on the wearable itself — UX drag.
- Skipping the companion mute — wearables in social settings need group-aware controls.
Where this is heading
Three trends by 2027: dedicated wearable agent OSes (beyond watchOS / Wear OS), MCP-over-BLE for low-power tool calls, and standardised "social mode" indicators across hardware vendors. Build the patterns above; the platforms will catch up.