Watch Claude Code work in a desktop UI: the seven streamed blocks a terminal can’t render

You can already read Claude Code in a terminal. The question people typing this into a search bar are really asking is how to watch it: see, at a glance, what tool is running right now, how long it has been stuck, what page the browser is on, and whether the agent is asking for approval or just thinking. That requires a native UI subscribed to the same Agent Client Protocol stream the CLI consumes. Fazm is one such app on macOS, open source at github.com/m13v/fazm. This guide walks through exactly what changes on screen, block by block, with the line numbers in the chat UI source.

M
Matthew Diakonov
8 min read

Direct answer, verified 2026-05-21

To watch Claude Code work in a UI, run it through an ACP-aware native desktop app. The agent loop is the same @agentclientprotocol/claude-agent-acp adapter the CLI uses. The UI subscribes to the protocol stream and renders each streamed event as its own visual element instead of a line of text. On macOS, Fazm draws seven block types distinctly (text, tool calls with a 5-second elapsed-time guard, thinking, discovery cards, observer cards, system events, browser activity). Source: github.com/m13v/fazm.

The seven block types you can actually see

The raw claude CLI streams a sequence of typed events that the terminal collapses into one long text column. A native UI looks at each event’s type and routes it to a different renderer. Fazm’s grouper is in Desktop/Sources/MainWindow/Components/ChatUIComponents.swift lines 7 through 17. Seven cases. Each is a different visual lane.

ContentBlockGroup, in source order

  1. 1

    Text

    Plain assistant reply, rendered as a markdown bubble so headings, lists, and inline code show up the same as in any chat app.

  2. 2

    Tool calls

    Grouped under one collapsible row with a spinner while running, a green check on success, and a one-line preview of the running call.

  3. 3

    Thinking

    Collapsible block headed by a brain icon and "Thinking..." label; click the chevron to read the reasoning trace.

  4. 4

    Discovery card

    A boxed card with a bold title and a short summary that expands to the full body when you click the chevron.

  5. 5

    Observer card

    An action card with inline buttons (approve, dismiss, custom action) the agent draws into the conversation when it needs a one-tap decision.

  6. 6

    System event

    Bordered side card with an icon, colored border, and Show details toggle. Used for session recovery, tool hang cancellation, interruptions, browser extension reconnect.

  7. 7

    Browser activity

    A dedicated row when the agent drives the browser, showing the tool name, the action, the mode, and the URL it is currently on.

The grouper merges consecutive blocks of the same type before rendering. Forty file reads in a row become one collapsible “40 actions” row, not forty rows. A long stretch of assistant prose with a hidden tool call in the middle becomes one continuous bubble instead of two orphan fragments. That logic is lines 47 through 130 of the same file.

The one anchor that makes this watchable, not just visible

Most UIs that render tool calls show a spinner while the call is running, a check when it is done. Watching that on a fast Mac is fine for sub-second calls; it is useless when the agent has been stuck on the same shell command for two minutes and you cannot tell whether to interrupt or wait.

Fazm’s answer is a tiny rendering guard. The ToolElapsedTime view at ChatUIComponents.swift line 295 starts a TimelineView that ticks once per second. The actual line is literally:

// ChatUIComponents.swift, line 293-299
TimelineView(.periodic(from: startDate, by: 1)) { context in
    let elapsed = Int(context.date.timeIntervalSince(startDate))
    if elapsed >= 5 {
        // Only show after 5 seconds to avoid flashing on quick tool calls
        Text(formatElapsed(elapsed))
    }
}

That is the line. A tool call under 5 seconds shows just the spinner. At 5 seconds, an elapsed counter appears next to the spinner and ticks up in real time: 5s, 6s, 7s... and after sixty seconds it switches to 1m 12s, etc. (the formatter is lines 302-310). The raw claude CLI has no equivalent indicator. You see when something starts and when it finishes; you do not see “this has been running for 47 seconds” without manually counting.

5s

The elapsed counter only appears at five seconds. It is the difference between watching the agent and watching scrollback.

ChatUIComponents.swift, ToolElapsedTime view, line 295

What watching looks like, vs reading the terminal

The same agent loop produces the same events. The terminal flattens them; the UI keeps them typed. Toggle between the two:

Same ACP session, two surfaces

claude CLI prints the same events as ANSI-styled text. Tool calls land as bullet lines, tool outputs land as fenced blocks, thinking traces and system events render as italicized notes that scroll past on a long run. Once a line goes off-screen, the only way to see it again is to scroll up through everything.

  • Tool call appears as text "⊳ running bash..."
  • No elapsed timer; you guess from how long the spinner has been alive
  • Thinking dump pushes the last reply off-screen
  • Interruption renders as one italic line you scroll past
  • Session ends with the terminal session; no persistent timeline

A streamed run, one block at a time

Here is what a single “refactor this file” session looks like as it streams in. Each frame is one block type arriving:

Watching a session stream

01 / 07

Frame 1: text

Assistant reply bubble. “Reading the file now, then I will outline the changes before editing.” Standard markdown rendering, monospace for inline code, headings work, same as any chat app.

The cards that have no terminal equivalent

Two of the seven blocks exist purely because a UI can draw something a terminal cannot: an action button you click, and a thin-bordered card that has color and an icon and stays where it is in the stream.

Observer card

Carries an activityId, a content body, and a list of buttons (label + action). The buttons fire a callback the host hooks into the agent protocol. Once acted on, the card disables further input and renders which action was taken. A terminal can ask a y/n but it cannot remember the answer next time you scroll back.

System event card

Six kinds: sessionRecovered (blue, arrow-circle), sessionRecoveryEmpty (orange, triangle), toolHangCanceled (orange, xmark octagon), taskHangCanceled (orange, person warning), userInterrupted (gray, stop circle), browserExtensionResumed (green, puzzle piece). Each one is a colored-border card with optional Show details toggle.

Discovery card

Bold title, short summary, expand-to-full chevron. Used when the agent emits a structured synthesis ("here is what I found in this codebase") that is not the same shape as a normal reply.

Browser activity row

Dedicated row when the agent drives the browser. Shows tool name, action verb, mode, and live URL. Same status enum as generic tool calls so the spinner / green-check behavior is consistent.

What the grouping logic actually buys you

The unsexy part of “watching” is that the stream is noisy. A real Claude Code run can fire 40+ tool calls between two assistant replies; some of those are ask_followup calls the host already drew somewhere else; some are reads, some are edits, some are searches.

The grouper at lines 47-130 of ChatUIComponents.swift handles three things the terminal cannot:

  • Consecutive runs of the same block type are merged. Forty reads become one “40 actions” row with a one-line summary of whichever call is running right now.
  • hiddenToolNames is a Set you can pass in to drop tools the host draws separately (the onboarding flow draws ask_followup outside the message stream, so it does not need to appear in the chat too). When every tool between two text blocks is hidden, the surrounding text merges into a single bubble instead of breaking into two orphan fragments.
  • A tool call group’s “inline summary” (line 164) shows the running call’s input summary while it is alive and switches to the first meaningful line of its output once it completes, stripping markdown fences. You can read what just happened without expanding.

Want to watch your own Claude Code session in a UI like this?

Grab a 15-minute call. I'll walk you through the chat UI source, the ACP wiring, and how to plug your existing Claude Pro account into Fazm.

How to actually try it

  1. macOS 14 or newer. The chat UI uses TimelineView(.periodic(...)) which needs the modern SwiftUI runtime.
  2. Download Fazm from fazm.ai/download or clone github.com/m13v/fazm and build it yourself. The chat UI lives under Desktop/Sources/MainWindow.
  3. Sign in with your existing Claude Pro or Max account on first launch. Usage hits your existing plan; the desktop app does not bring its own model.
  4. Start a new chat, paste a real task, and let it run for a minute. Watch the spinner. Wait for the 5s elapsed counter on a longer tool call. Click the chevron on a tool group to see the inputs and outputs. Click Approve on an observer card.
  5. Quit the app, restart your Mac, reopen it. The chat is still there. The system-event card at the top of the conversation tells you the session was recovered.

Frequently asked

Frequently asked questions

What does watching Claude Code actually mean in a desktop UI?

The raw claude CLI prints a streaming text transcript. A native UI subscribes to the same Agent Client Protocol (ACP) session and renders the stream as discrete UI elements instead of flat text. Each ACP event (text delta, tool_call, tool_call_update, agent_thought, system_event) becomes its own bubble or card. The agent loop is unchanged. What changes is that a tool call no longer disappears into scrollback, a thinking trace is collapsed by default instead of pushing your last reply off-screen, and an interruption is a card with a colored border instead of an italicized line you miss when you tab back.

Which streamed block types does Fazm draw distinctly?

Seven, defined as the ContentBlockGroup enum in Desktop/Sources/MainWindow/Components/ChatUIComponents.swift lines 7-17: text, toolCalls, thinking, discoveryCard, observerCard, systemEvent, and browserActivity. Consecutive blocks of the same type are merged into one bubble or one collapsible group, so you do not get 40 individual rows for 40 file reads, you get one Tool Calls row that says "40 actions" with a one-line summary of whatever it is doing right now.

How do I see when a tool call is taking too long?

Tool call rows show a spinner while running. After 5 seconds, an elapsed-time counter appears next to the spinner: "5s, 6s, 7s…" then "1m 12s" etc. This is gated by the literal line `if elapsed >= 5` in the ToolElapsedTime view at ChatUIComponents.swift line 295. Anything that finishes in under 5 seconds just shows the spinner so the row does not flash a number for a fast call. Anything that hangs gets a visible timer you can use to decide whether to interrupt.

Can I expand a tool call group to see the full input and output?

Yes. The group header shows the action verb plus a one-line summary of the running call. Click the chevron and the group expands into a list of every individual tool call in that run, each with its name, input summary, status (spinner or green check), and the first six lines of output rendered in a monospaced inline block. Markdown code fences are stripped so the inline output stays compact. The detail panel is the same code path the FaqSection uses for its accordion.

What is the discovery card and why is it separate from text?

Some agent loops emit a structured summary block ("here is what I found in this codebase") that is not the same shape as a normal assistant reply. The UI gives it a boxed card with a bold title, a short summary, and an expand-to-full-body chevron, so the user can decide whether to read the synthesis or skip it. In a terminal these summaries fight with everything else for vertical real estate; in the UI they get their own visual lane.

What goes in an observer card?

Anything the agent wants a one-tap human decision on. Concretely: "I think this is the right command to run, approve / dismiss / show me a different one." The card carries an activityId, a content body, and a list of buttons, each with a label and an action key. The buttons fire a callback that the host hooks into the agent's protocol; once acted on, the card shows which action was selected and disables further input. A terminal can ask the same question with "y/n" but it cannot remember which choice you already made next time you scroll back through the chat.

What kinds of system events appear as their own cards?

Six right now, defined in SystemEventCardView (ChatUIComponents.swift lines 403-494): sessionRecovered (blue, circular-arrow icon), sessionRecoveryEmpty (orange, warning triangle), toolHangCanceled (orange, xmark.octagon), taskHangCanceled (orange, person-warning), userInterrupted (gray, stop.circle), and browserExtensionResumed (green, puzzle piece). Each is a thin-bordered card centered in the message stream with a colored border, an icon-led header, body text, and an optional Show details toggle that reveals a key-value table.

How is a browser activity row different from a normal tool call?

When the agent uses the browser tool, the event arrives with a toolName, an action (navigate, click, type, screenshot), a mode (visible or headless), and a URL. The UI renders this as a dedicated row instead of folding it into the generic Tool Calls group, so you can see at a glance which page the agent is on, even when the same tool fires dozens of times in a session. The status (running, success, error) is the same enum the generic tool calls use, so you get the same spinner and green check.

Does the UI store the full stream so I can scroll back through long sessions?

Yes. Every event arriving over ACP is persisted as a row in the chat's local store, so the conversation survives a Mac restart and you can scroll back through the entire timeline of one chat. Because each block is a typed object and not a flat string, the UI can re-render tool-call groups and system-event cards correctly even on cold start, without having to re-parse a text transcript.

Is Fazm the only desktop UI for Claude Code?

No. Zed has had ACP-aware Claude Code integration for a while, Cursor and Cline have their own surfaces, and the underlying @agentclientprotocol/claude-agent-acp adapter on npm is open for anyone to build against. What is specific to Fazm is the native macOS shell, the seven streamed block types, the 5-second elapsed-time guard, persistent sessions that survive a Mac restart, one-click chat forking, no auto-compacting, and the same agent reaching beyond the terminal into the browser and other Mac apps. The source is at github.com/m13v/fazm if you want to read the chat UI implementation yourself.

Do I have to pay for Claude Code separately to use a desktop UI?

If you have a Claude Pro or Max subscription with Anthropic, the same OAuth account works inside the desktop app and usage hits your existing plan. If you do not, you can plug in an API key or route through a corporate proxy or compatible gateway. The desktop UI does not bring its own model, it is a presentation layer on top of the same agent loop. Whatever you pay for Claude Code at the CLI is what you pay for it in the UI.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.