The Missing Tools in the AI Agent Ecosystem
What Tools Do AI Agents Wish Existed?
If you asked an AI agent what tools it needs to do its job better, the answer would not be "a better LLM." It would be "a way to reliably identify UI elements across every application on this computer."
The model layer is advancing fast. The tooling layer is years behind.
The Universal UI Inspector
The single most requested missing tool is a universal UI element inspector that works across every desktop application. Today, agents cobble together accessibility APIs, DOM inspection, and screenshot analysis depending on the app. Each approach has gaps:
- Accessibility APIs work great for native apps but miss custom-rendered UI
- DOM inspection only works in browsers
- Screenshot analysis is slow and unreliable for precise interaction
What agents need is one consistent interface that says "here are all the interactive elements on screen, here are their labels, here are their positions, here is how to interact with them" - regardless of whether the app is a native macOS window, an Electron app, or a browser tab.
Cross-App State Awareness
Agents frequently need to move data between applications. Copy a name from a CRM, paste it into an email, reference a document in a calendar invite. Today, each transition requires the agent to re-orient itself in a new application context.
A cross-app state manager would maintain awareness of what the agent is working on across application boundaries. Instead of losing context every time focus shifts to a new window, the agent would maintain a persistent understanding of the current workflow.
Reliable Desktop Interaction APIs
The existing options - AppleScript, accessibility APIs, keyboard simulation - all have reliability issues. Actions fail silently. State changes are not reported. Timing is unpredictable.
Agents need desktop interaction APIs that are as reliable as web APIs - with clear success and failure responses, predictable timing, and consistent behavior across application updates.
Building the Missing Layer
These gaps represent real opportunities. The teams building reliable, universal tooling for desktop agents will define how the next generation of automation works.
Fazm is an open source macOS AI agent. Open source on GitHub.