Wearable agent integration: building agents for AI glasses, pins, and earbuds

The 2024 cohort of AI pendants and 2025 wave of AI glasses settled into a small set of working patterns. The hardware that survived shares an agent architecture: capture cheaply, triage locally, hand off to cloud sparingly, surface results without dominating the wearer's attention or violating those around them.

The wearable form factors

Four classes shipping in 2026:

Smart earbuds — voice-first, no display. Ambient audio capture optional.
AI pendants and pins — voice + ambient capture, tiny optional screen.
AI glasses — voice + camera + small display + bone-conduction audio.
Smart watches with agent layer — established form factor, deeper agent integration.

Architecture is broadly similar across all four; UX implications differ sharply.

The capture problem

Wearables capture more context than other devices: what the wearer hears, sees, says. Three rules that are now non-negotiable:

Hardware mute — physical control, visible to others.
Capture indicator — LED or screen showing recording state, visible from outside the device.
Local processing default — capture stays local until the wearer triggers cloud action.

Companies that violated these in 2024–2025 face active litigation. Do not be among them.

Agent architecture for a wearable

Five layers:

hardware capture (mic, camera, IMU)
   ↓
on-device wake / trigger (always on, model in DSP)
   ↓
on-device intent classification (small model)
   ↓ if cloud needed
companion device or direct connection
   ↓
cloud agent (voice + tools + memory)
   ↓
return: speech, glanceable display, haptic

The first two layers run continuously; the rest activate only on trigger. Battery and privacy both depend on this gate.

Output modalities

Three channels, often combined:

Bone-conduction audio — private, hands-free, but slow for long answers.
Glanceable display — quick recall, recipe step, navigation cue.
Haptic — confirmation, alerts.

Pick the lightest channel that conveys the answer. Long monologues fail; quick glances succeed.

What works on wearables

Five use cases that survived early experiments:

Hands-busy translation

Conversation in a foreign language; agent translates in near-real-time. Glasses excel here.

Captioning

Real-time transcription for hard-of-hearing wearers. Strong adoption.

Capture and recall

"Remind me what they said about pricing" — capture, on-device transcription, retrievable later.

Glasses with route arrows; light cognitive load.

Quick lookups

"What is this plant?" via camera → identification → audio answer.

What does not

Long form Q&A — the hardware is not a phone; users put down the device.
Heavy media consumption — battery and weight kill it.
Continuous always-on transcription — privacy plus battery; opt-in by session.

Privacy as a product feature

Three commitments that build trust:

Companion privacy mode — wearer can mute capture for everyone in earshot for N minutes.
Recording badge — visible when capturing; wearer cannot disable in social mode.
Local-first storage — captured content stays on device unless explicitly synced.

Products that built these into the hardware are the ones still shipping in 2026.

Cost and latency budget

Layer	Latency target	Energy
Wake detection	< 50 ms	very low
Local intent	< 200 ms	low
Companion phone hop	+ 30 ms	medium
Cloud round-trip	+ 300–600 ms	medium-high
TTS streaming	+ 200 ms	low

Total interactive budget under 1.5 s end-to-end. Past that, the wearer reaches for their phone instead.

Companion-app pattern

Most wearables pair with a phone. The phone hosts:

The agent loop (heavy model interactions).
Memory and history.
Settings and consent management.
Sync with cloud when appropriate.

The wearable hosts:

Capture and on-device triage.
Quick responses (cached, lightweight).
Display / haptic output.

Mobile MCP patterns apply directly — see mobile MCP client implementation.

Common mistakes

Continuous capture without indicator — legal and social disaster.
No local fallback — wearables go offline; agents must too. See offline-capable agents.
Heavy UI on the wearable itself — UX drag.
Skipping the companion mute — wearables in social settings need group-aware controls.

Where this is heading

Three trends by 2027: dedicated wearable agent OSes (beyond watchOS / Wear OS), MCP-over-BLE for low-power tool calls, and standardised "social mode" indicators across hardware vendors. Build the patterns above; the platforms will catch up.