Meta Shipped a Desktop Agent That Runs Terminal Commands - But That's Just Step One
When Meta released a desktop agent that can run terminal commands, it was a meaningful step. But let's be honest about what terminal access actually gives you. It gives you access to developer tools. Most of the work people do on their computers doesn't happen in a terminal.
Your accountant doesn't use the terminal. Your designer doesn't invoice clients from the command line. Your operations manager doesn't update project timelines in bash. They use Excel, Figma, Asana, Outlook, and dozens of other GUI applications that have no command-line interface at all.
The GUI Control Gap
The hard problem in desktop automation isn't running git commit or curl. It's clicking the right button in a complex application interface. It's navigating a multi-tab spreadsheet to find and update a specific cell. It's filling out a form that changes dynamically based on previous inputs. These are the tasks that eat hours of people's time, and they all happen in graphical applications.
Accessibility APIs are what bridge this gap. On macOS, every native application exposes its UI elements through the accessibility tree - buttons, text fields, menus, tables, checkboxes. An agent that can read and interact with this tree can control any application the same way a human does, without needing screenshots or pixel matching.
Terminal Plus GUI Is the Full Picture
The real breakthrough isn't terminal OR GUI control - it's both. An agent that can run a script to process data, then open the results in Numbers, format the spreadsheet, and email it through Mail.app is doing something genuinely useful. Each piece alone is limited. Together, they cover how people actually work.
Terminal-first agents are a foundation. But the building that matters is everything above it.
Fazm is an open source macOS AI agent. Open source on GitHub.