Back to Blog

Building a macOS Desktop Agent with Claude - How AI Wrote Most of Its Own Code

Fazm Team··3 min read
claudeai-codingswiftmacosdeveloper-tools

Building a macOS Desktop Agent with Claude

Here is something that sounds circular but actually works: using an AI coding assistant to build an AI desktop agent.

Fazm is a macOS app that can see and control your screen. It uses ScreenCaptureKit to grab frames, accessibility APIs to click and type things, and Whisper for voice input. The interesting part is that Claude wrote most of the Swift code itself.

How It Works in Practice

The key was getting the architecture figured out first. Once we had clear CLAUDE.md files describing the project structure, the component boundaries, and the conventions, Claude got surprisingly good at writing native Mac code.

A typical development session looks like:

  1. Describe the feature in plain language
  2. Claude reads the existing codebase and writes the implementation
  3. Build, test, iterate

For something like adding a new accessibility API interaction - say, reading the contents of a specific text field in a specific app - Claude can look at how existing interactions work and extend the pattern. The Swift type system helps a lot here because the compiler catches most mistakes before runtime.

What Claude Is Good At

  • Boilerplate and patterns. SwiftUI views, async/await pipelines, accessibility API wrappers - once Claude sees one example, it can produce correct variations quickly.
  • API integration. Given Apple's documentation and existing usage in the codebase, Claude writes correct ScreenCaptureKit and accessibility API code on the first try more often than not.
  • Test scaffolding. Setting up XCTest cases for the agent's action pipeline is tedious work that Claude handles well.

What Required Human Architecture

  • The overall pipeline design. How screen capture, LLM processing, and action execution chain together needed human thinking about latency, error handling, and state management.
  • Privacy decisions. What data stays local, what gets sent to the LLM, how voice recordings are handled - these are product decisions, not code decisions.
  • The accessibility API strategy. The decision to use the accessibility tree instead of screenshot-based OCR was a fundamental architecture choice that shaped everything downstream. We explain the tradeoffs between these two approaches in how AI agents see your screen.

The CLAUDE.md Pattern

The most valuable thing we did was maintaining detailed CLAUDE.md files. These files tell Claude:

  • What each module does and where it lives
  • What conventions the codebase follows
  • What Swift patterns to use (and which to avoid)
  • How to run builds and tests

This sounds like documentation, and it is - but it is documentation optimized for an AI reader rather than a human one. The result is that any new Claude session can pick up where the last one left off without re-discovering the codebase from scratch. We expanded on this idea significantly in the HANDOFF.md pattern, which covers context window management across sessions.

Running Multiple Agents in Parallel

For larger features, we run multiple Claude Code sessions simultaneously. Each agent works on an isolated scope - one might handle the UI layer while another works on the data pipeline. The rule is simple: no two agents edit the same file.

This works surprisingly well when the architecture has clean module boundaries. Each agent reads the shared CLAUDE.md for context but writes to its own set of files. We wrote a dedicated post on running parallel AI agents on one codebase with the full playbook on tmux, branch isolation, and scope assignment.


This post is based on our experience shared in r/ClaudeAI. Fazm is open source on GitHub.

Related Posts