Agent Workflow: How AI Agents Execute Multi-Step Tasks on Your Desktop

Matthew Diakonov··12 min read

Agent Workflow: How AI Agents Execute Multi-Step Tasks on Your Desktop

An agent workflow is a structured sequence of steps that an AI agent follows to complete a task. Instead of a single prompt-and-response, the agent plans, executes, observes results, and decides the next action in a loop. This is what separates a chatbot from an agent: the ability to take action across multiple steps, handle errors, and reach a defined goal.

If you have used an AI coding assistant that edits files, runs tests, reads the output, and fixes failures on its own, you have already experienced an agent workflow. The same pattern applies to desktop automation, data processing, browser tasks, and DevOps.

The Core Agent Loop

Every agent workflow follows the same fundamental cycle, regardless of what the agent is doing:

Perceive(read state)Plan(decide next)Execute(take action)Evaluate(check result)loop until goal reached or max steps
  1. Perceive - The agent reads its current environment. This could be the contents of a file, the state of a UI, an API response, or the output of a command it just ran.
  2. Plan - Based on what it perceives and what the goal is, the agent decides the next action.
  3. Execute - The agent takes the action: edits a file, clicks a button, sends a request, runs a shell command.
  4. Evaluate - The agent checks whether the action succeeded, whether the goal is met, or whether it needs to adjust and loop back.

This loop continues until the agent reaches its goal, hits a maximum step count, or encounters an error it cannot recover from.

Types of Agent Workflows

Not all workflows are the same. The type you need depends on the complexity and structure of the task.

| Type | Description | Best for | Example | |---|---|---|---| | Linear | Steps run in a fixed sequence, one after another | Predictable tasks with known steps | Fill out a form, export to PDF, email the result | | Branching | Agent chooses different paths based on conditions | Tasks with variable outcomes | If build passes, deploy; if build fails, fix and retry | | Looping | Agent repeats a step until a condition is met | Iterative refinement | Edit code, run tests, fix failures, repeat until green | | Parallel | Multiple sub-tasks run at the same time | Independent tasks that can be batched | Scrape 10 pages simultaneously, then merge results | | Hierarchical | A top-level agent delegates to specialized sub-agents | Complex tasks requiring different skills | Research agent feeds data to writing agent feeds to review agent |

Most real workflows are a mix. A coding agent uses a looping workflow (edit, test, fix), but within each loop iteration, it might branch (different fix strategy depending on the error type).

Building Blocks of a Workflow

An agent workflow is built from a few primitives. Getting these right matters more than the LLM you choose.

Tools

Tools are the actions an agent can take. A desktop agent's tools might include:

tools = [
    "read_file",        # read a file from disk
    "write_file",       # write or edit a file
    "run_command",      # execute a shell command
    "click_element",    # click a UI element on screen
    "type_text",        # type into a focused field
    "take_screenshot",  # capture the current screen state
    "search_web",       # look something up online
]

The number of tools should stay small. An agent with 200 tools spends most of its context window figuring out which tool to use instead of making progress. In practice, 10 to 20 well-defined tools cover most desktop automation scenarios.

State and Memory

The agent needs to remember what it has done and what still needs doing. There are three levels:

Note

State management is the most common source of bugs in agent workflows. An agent that forgets what it already tried will loop forever. An agent that loses track of its goal will wander.

  • Conversation context - The message history between the LLM and the tool executor. This is the most basic form of memory and is sufficient for short workflows (under ~20 steps).
  • Structured state - A JSON object, database row, or file that tracks progress. Used when the workflow spans many steps or needs to survive a restart. Example: a state.json that records which items in a list have been processed.
  • Long-term memory - Persisted knowledge across separate workflow runs. Used when the agent learns from past executions to improve future ones. Example: remembering that a certain website requires a 2-second delay between requests.

Error Handling

Agents fail. APIs return 500s, UI elements move, files get locked by other processes. The error handling strategy defines whether the workflow is fragile or production-ready.

# Simple retry with backoff
def execute_with_retry(action, max_retries=3):
    for attempt in range(max_retries):
        result = agent.execute(action)
        if result.success:
            return result
        if not result.is_retryable:
            return agent.plan_alternative(result.error)
        time.sleep(2 ** attempt)
    return agent.escalate_to_human(action, result.error)

The three strategies, in order of preference:

  1. Retry - Same action, wait and try again. Works for transient errors (network timeouts, rate limits).
  2. Adapt - Different action to achieve the same goal. The agent reads the error, reasons about it, and picks a new approach. This is where LLM reasoning adds the most value.
  3. Escalate - Give up and ask a human. The agent should know when it is stuck. An agent that retries the same failing action 50 times is worse than one that asks for help after 3 failures.

A Practical Example: Desktop File Organization

Here is a real agent workflow for organizing files on a macOS desktop. It is simple enough to follow step by step but shows all the core patterns.

Goal: Move all screenshots from ~/Desktop into ~/Screenshots/{YYYY-MM}/, creating date folders as needed.

# Step 1: Perceive - list files
ls ~/Desktop/Screenshot*.png

# Step 2: Plan - for each file, extract date from filename
# macOS screenshots follow: "Screenshot YYYY-MM-DD at HH.MM.SS.png"

# Step 3: Execute - create folders and move files
for file in ~/Desktop/Screenshot*.png; do
    date_part=$(echo "$file" | grep -oE '[0-9]{4}-[0-9]{2}')
    mkdir -p ~/Screenshots/"$date_part"
    mv "$file" ~/Screenshots/"$date_part"/
done

# Step 4: Evaluate - verify
ls ~/Desktop/Screenshot*.png 2>/dev/null && echo "FAIL: files remain" || echo "PASS: desktop clean"

An AI agent running this workflow would do the same thing, but it would also handle edge cases a static script cannot: files with non-standard names, permission errors, or files that are currently open in Preview.

Agent Workflow vs. Traditional Automation

You might be wondering: how is this different from a shell script, a Zapier zap, or an Automator workflow? The difference comes down to three things.

| Dimension | Traditional automation | Agent workflow | |---|---|---| | Error handling | Predefined error paths, fails on unexpected errors | Reasons about errors, adapts approach dynamically | | Input flexibility | Fixed inputs, brittle to format changes | Handles natural language, varied formats, ambiguous inputs | | Scope | One task, one path | Can decompose new tasks it has never seen before |

Traditional automation is better when the task is perfectly defined and never changes. Agent workflows shine when the task is fuzzy, the environment is unpredictable, or you need to handle variations without writing a new branch for each one.

Warning

Agent workflows are slower and more expensive than traditional scripts. If a 10-line bash script solves your problem, use the bash script. Agent workflows are for tasks where writing that script would take longer than just telling the agent what you want.

Common Pitfalls

  • No exit condition. An agent without a max step count will loop forever if the goal is ambiguous or unreachable. Always set a ceiling (50 steps is reasonable for most tasks).
  • Too many tools. Each tool costs tokens in the system prompt. An agent with 100 tools performs worse than one with 15 focused tools, even on tasks that need more tools. Swap tool sets between phases instead.
  • Ignoring intermediate state. If the agent crashes at step 12 of a 20-step workflow and has to restart from step 1, you lose time and money. Checkpoint progress to disk.
  • No human escalation path. Some tasks require human judgment (approving a payment, confirming a destructive delete). Build the "ask a human" tool into the workflow from day one, not as an afterthought.
  • Over-engineering the plan. Agents work best with simple, concrete instructions per step. A 500-word plan for each step is worse than a 2-sentence plan because the agent spends tokens re-reading its own plan instead of acting.

Minimal Working Example

Here is a minimal agent workflow loop in Python. This is the skeleton you would wrap around any task:

import json

class AgentWorkflow:
    def __init__(self, goal, tools, max_steps=50):
        self.goal = goal
        self.tools = tools
        self.max_steps = max_steps
        self.history = []

    def run(self):
        for step in range(self.max_steps):
            # Perceive
            observation = self.get_current_state()

            # Plan
            action = self.llm_decide(
                goal=self.goal,
                observation=observation,
                history=self.history,
                tools=self.tools,
            )

            if action.type == "done":
                return {"status": "success", "steps": step + 1}

            # Execute
            result = self.execute_tool(action.tool, action.args)

            # Evaluate and record
            self.history.append({
                "step": step,
                "action": action.tool,
                "args": action.args,
                "result": result.output,
                "success": result.success,
            })

        return {"status": "max_steps_reached", "steps": self.max_steps}

This 30-line skeleton covers: the core loop, step tracking, history for context, a clean exit on completion, and a safety valve at max_steps. Everything else (retry logic, parallel execution, checkpointing) layers on top.

What Makes a Good Agent Workflow

After building and running hundreds of agent workflows, a few patterns consistently separate the ones that work from the ones that don't:

Small, well-defined tools that do one thing each
Clear goal expressed in one sentence, not a paragraph
State checkpointed to disk so crashes are recoverable
Human escalation built in from the start
Giant tool lists that confuse the LLM's tool selection
No step limit, letting the agent run indefinitely on failure
Stateless workflows that restart from scratch on every failure

Wrapping Up

An agent workflow is just a loop: perceive, plan, execute, evaluate. The power comes from the LLM's ability to adapt when things go wrong, handle varied inputs, and decompose new tasks without being explicitly programmed for each one. Start with the minimal loop, add checkpointing and error handling as you need it, and always set a step limit.

Fazm is an open source macOS AI agent that runs agent workflows on your desktop. Open source on GitHub.

Related Posts