Back to Blog

The Gap Between Agent Memory and Agent Execution - You Need Both

Fazm Team··2 min read
agent-architecturememoryexecutionmcpdesktop-agent

The Gap Between Agent Memory and Agent Execution

Building Fazm taught us that an AI agent has two halves that are equally important and usually developed unevenly: memory (what the agent knows and remembers) and arms (what the agent can actually do).

Most projects start with one and neglect the other.

The Arms Problem

The first version of Fazm was essentially Claude with access to the macOS accessibility API through an MCP server built in Swift. We called it "building the arms" - giving the AI the ability to physically interact with the desktop.

This MCP server wraps macOS accessibility APIs so the agent can:

  • Enumerate every UI element in any running application
  • Click buttons, fill text fields, select menu items
  • Read the current state of any window or dialog
  • Capture screenshots for visual verification

Without these arms, the agent is just a chatbot. It can talk about opening Slack and sending a message, but it cannot actually do it.

The Memory Problem

Arms without memory create a different problem: the agent that forgets everything. Every session starts from zero. It does not remember your preferences, your workflow patterns, which apps you use for what, or what it tried last time that did not work.

This is the gap most desktop agents fall into. They can execute actions but have no persistent understanding of the user's environment.

Bridging the Gap

The solution is building both systems and connecting them:

  • Execution layer. MCP servers that wrap OS-level APIs for actual desktop control.
  • Memory layer. Persistent storage of user preferences, workflow patterns, and action history that carries across sessions.
  • Planning layer. The LLM that sits between memory and execution, using what it knows about you to decide what to do and how to do it.

An agent with great memory and no arms is a note-taking app. An agent with great arms and no memory is a macro recorder. You need both to build something that actually feels like an assistant.

More on This Topic

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts