You Don't Have a Claude Code Problem, You Have an Architecture Problem
You Don't Have a Claude Code Problem, You Have an Architecture Problem
When people complain that Claude Code or any AI agent "does not work" for desktop automation, the real issue is almost always architecture. The same rule that applies to coding agents applies to desktop automation agents: give the LLM thin action primitives - click, type, read - and let it compose them into workflows.
The Monolithic Script Trap
The instinct is to write big, detailed scripts that handle entire workflows. "Open Figma, find the design file, export all assets, rename them, upload to S3." This breaks the moment anything changes - a button moves, a dialog pops up, a loading spinner takes longer than expected.
Thin Primitives, Smart Composition
The better approach is to expose small, reliable actions:
- click(element) - click a UI element by accessibility label or coordinates
- type(text) - type text into the focused field
- read(region) - read text from a screen region or accessibility tree
- wait(condition) - wait for a specific state
The LLM chains these together based on what it observes. If a dialog pops up unexpectedly, the model sees it and adapts. If a button label changes, the model can still find it by context.
Why This Works
LLMs are good at reasoning about sequences of simple actions. They are bad at executing complex scripts where one failure cascades into chaos. By keeping each primitive atomic and observable, you let the model do what it does best - plan and adapt - while keeping each individual step reliable.
This is exactly how Fazm's macOS agent works. The accessibility API exposes the UI tree, the agent reads it, and composes actions from simple building blocks. No brittle scripts. No hardcoded coordinates. Just primitives the model orchestrates.
The Lesson Beyond Desktop Automation
This principle extends to any agent architecture. Keep your tools small and composable. Let the LLM be the orchestration layer. If your agent is failing, look at your tool design before blaming the model.
Fazm is an open source macOS AI agent. Open source on GitHub.