Back to Blog

Controlling AI Agents with Eyes and Voice - The Next Interface

Fazm Team··2 min read
gaze-trackingvoice-controlinterfaceai-agentfuture

Voice commands work well for desktop agents when the task is unambiguous. "Open Slack" is clear. "Send the file" is not - which file, to whom, in which app? The agent either guesses or asks follow-up questions. Both slow you down.

Gaze tracking solves the targeting problem. You look at a file on your desktop, say "send this to James," and the agent knows exactly which file you mean because it knows where your eyes are pointing. No ambiguity, no follow-up questions, no pointing with a mouse.

How Eyes and Voice Work Together

Voice handles the verb - what you want to do. Gaze handles the noun - what you want to do it to. This mirrors how humans naturally communicate. When you tell someone "hand me that," you glance at the thing you want. The listener follows your gaze. Desktop agents can do the same.

Modern MacBooks have cameras capable of gaze estimation. The resolution is not pixel-perfect, but it doesn't need to be. If the agent knows you're looking at the top-right quadrant of the screen, it can narrow down the target from dozens of elements to two or three. Combined with the voice command, there's usually only one match.

Why This Matters for Flow

The real benefit is that your hands never leave what they're doing. You're typing code, glance at a Slack notification, say "remind me about this in an hour," and go back to typing. You're reading a PDF, look at a chart, say "save this image," and keep reading. The interface disappears.

Keyboard shortcuts require memorization. Mouse interactions require hand movement. Voice alone requires verbal precision. Eyes plus voice require nothing more than what you're already doing - looking at things and talking about them.

This is not science fiction. The hardware exists in every laptop with a front-facing camera. The missing piece is the agent that connects gaze data to voice commands and translates both into action.

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts