You Don't Have a Claude Code Problem, You Have an Architecture Problem

Matthew Diakonov

Updated March 19, 2026

architecture claude-code desktop-automation primitives agent-design workflows

You Don't Have a Claude Code Problem, You Have an Architecture Problem

When people complain that Claude Code or any AI agent "does not work" for desktop automation, the real issue is almost always architecture. The same rule that applies to coding agents applies to desktop automation agents: give the LLM thin action primitives - click, type, read - and let it compose them into workflows.

The Monolithic Script Trap

The instinct is to write big, detailed scripts that handle entire workflows. "Open Figma, find the design file, export all assets, rename them, upload to S3." This breaks the moment anything changes - a button moves, a dialog pops up, a loading spinner takes longer than expected.

Thin Primitives, Smart Composition

The better approach is to expose small, reliable actions:

click(element) - click a UI element by accessibility label or coordinates
type(text) - type text into the focused field
read(region) - read text from a screen region or accessibility tree
wait(condition) - wait for a specific state

The LLM chains these together based on what it observes. If a dialog pops up unexpectedly, the model sees it and adapts. If a button label changes, the model can still find it by context.

Why This Works

LLMs are good at reasoning about sequences of simple actions. They are bad at executing complex scripts where one failure cascades into chaos. By keeping each primitive atomic and observable, you let the model do what it does best - plan and adapt - while keeping each individual step reliable.

This is exactly how Fazm's macOS agent works. The accessibility API exposes the UI tree, the agent reads it, and composes actions from simple building blocks. No brittle scripts. No hardcoded coordinates. Just primitives the model orchestrates.

The Lesson Beyond Desktop Automation

This principle extends to any agent architecture. Keep your tools small and composable. Let the LLM be the orchestration layer. If your agent is failing, look at your tool design before blaming the model.

Fazm is an open source macOS AI agent. Open source on GitHub.

You Don't Have a Claude Code Problem, You Have an Architecture Problem

You Don't Have a Claude Code Problem, You Have an Architecture Problem

The Monolithic Script Trap

Thin Primitives, Smart Composition

Why This Works

The Lesson Beyond Desktop Automation

More on This Topic

Related Posts

Claude Code as the Brain for Desktop Automation Workflows

Skills vs Sub-Agents in Claude Code - When to Use Each Pattern

Claude Code Writes Your Code, but Do You Know What's in It?