ollama v0.21.0April 17, 2026macOS

Ollama release notes April 2026: the boundary v0.21.0 does not move

v0.21.0 shipped on April 17, 2026. Copilot CLI, Hermes, and OpenClaw got pulled inside ollama launch. Gemma 4 got flash attention on compatible GPUs. Every addition expands what the local model talks to. None of them touch the Mac-desktop action boundary, which lives above Ollama's scope by design.

F
Fazm
10 min read
4.9from 200+
Uses real accessibility APIs, not screenshots
Works on any Mac app, not just the browser
Consumer app, no MCP config to touch

What's in v0.21.0, at a glance

Every item here is a release-note entry. Every item stops at the token boundary.

v0.21.0
2026-04-17
Copilot CLI
Hermes
OpenClaw
Gemma 4 flash attention
Metal fix
idempotent config
cross-compile fix

What every April 2026 roundup gets right, and what they miss

The top ten results for "ollama release notes april 2026" all cover the same surface. The GitHub Releases page, Local AI Master, Releasebot, myreleasenotes, the ollamān.com changelog, the Ollama blog: each one lists v0.21.0, the April 17 publish date, the new ollama launch integrations, the Gemma 4 fixes, and the Metal cross-compile patch. All of that is correct and useful.

What those articles skip, because it is not in the scope of an Ollama release, is the layer above Ollama. The release notes tell you what the local model can be handed. They do not tell you what the local model can see or touch on your actual Mac. That second question is where a real agent lives, and its answer is not in any Ollama version number.

This guide walks the v0.21.0 changelog, then drops into the exact shipping code where the Mac-desktop action boundary sits. The code is in Fazm. The anchor fact is a binary at Fazm.app/Contents/MacOS/mcp-server-macos-use registered with args: [] and env: [], which is what makes the boundary portable.

The v0.21.0 changelog, entry by entry

Eight line items. Three inbound integrations, two Gemma 4 model fixes, two Mac-developer ergonomics fixes, and one idempotency guard. Read the whole list and notice what category is missing.

Hermes in `ollama launch`

A new coding agent can be configured in the same one-command launcher alongside the existing options. Expands the inbound surface but does not add a GUI-action output.

GitHub Copilot CLI

Copilot CLI now speaks the same OpenAI-compatible endpoint on localhost:11434. Stdin and stdout, not desktop clicks.

OpenClaw multi-channel

Wire WhatsApp, Telegram, and Discord into a local model via `ollama launch openclaw`. Message in, text out, still at the token boundary.

Gemma 4 flash attention

Enabled on compatible GPUs. A runtime-efficiency win for training and inference, not an application-layer change.

Gemma 4 nothink renderer

Restored with the e2b-style prompt. Fixes the `/nothink` toggle on small Gemma 4 variants. Prompt-template plumbing, not agent plumbing.

Metal compiler fix

Gemma 4 Metal build error resolved. Apple Silicon users who hit the April 10 regression get a clean build again.

macOS cross-compile no longer triggers generate

cmake builds on certain Xcode versions were invoking `generate` during configure; that path is now gated. Developer-ergonomics fix for Mac contributors.

Idempotent config write

`ollama launch` no longer rewrites `~/.ollama/launch.yml` when the configured set is unchanged. Quiet fix, meaningful for dotfile-managed setups.

Missing category: anything that touches the Mac-desktop GUI surface. No accessibility API work. No CGEvent synthesis. No window-frame resolution. Ollama's scope stops at the token boundary, and the April 17 release respects that scope.

What the new integrations actually look like

The new inbound surface in v0.21.0 is the ollama launch wizard, which now configures coding agents (Claude Code, Cursor, Continue, Hermes, GitHub Copilot CLI) and messaging channels (OpenClaw) in a single command. Every one of these is a client that sends text and receives text.

ollama launch

Notice the final line: "skipped: unchanged". That is the idempotent-config-write fix from the release notes, visible in the terminal output. The wizard no longer rewrites ~/.ollama/launch.yml when the selected set matches the current file.

Where v0.21.0 stops and where the Mac agent begins

Ollama's surface is the OpenAI-compatible REST on localhost:11434. Clients plug into it from the left. Downstream of it, if you want a Mac agent rather than a chat completion, you need a perception layer and an action layer. Those live on the right.

Ollama v0.21.0 boundary (left of hub) versus the Mac-desktop agent layer (right of hub)

Copilot CLI
Hermes coding agent
OpenClaw channels
ollama run / ollama serve
localhost:11434
Perception: AX tree walk
Action: CGEvent synthesis
mcp-server-macos-use
Six _and_traverse tools

v0.21.0's additions live on the left of the hub. Fazm's mcp-server-macos-use lives on the right of the hub. The hub is the token-boundary where the two stacks meet.

Anchor fact: the registration code is already provider-agnostic

This is the block in Fazm that decides how the Mac-desktop agent boundary is wired. The path is resolved on line 63. The registration block is lines 1057 through 1064. The default model identifier is line 1245. The authoritative built-in MCP list is line 1266. Open acp-bridge/src/index.ts and grep for any of these line numbers and you will find them.

acp-bridge/src/index.ts

Why this is the uncopyable part: the registration passes args: [] and env: [] to the binary. There is no Claude-specific flag, no Anthropic-specific environment variable, nothing that would have to be renamed or rewired on the day Ollama (or anyone else) becomes the backend. The binary speaks MCP over stdio and the MCP tool_use / tool_result shape is defined by the model-provider layer above the binary, not by the binary itself. The only line that names a provider is DEFAULT_MODEL at line 1245.

From Ollama endpoint to Mac CGEvent, hop by hop

1

Ollama v0.21.0 starts on localhost:11434

OpenAI-compatible REST: POST /v1/chat/completions, /api/tags, /api/show. That is the top of Ollama's stack.

Anything that calls into this endpoint (Copilot CLI, OpenClaw, a Python script, a future agent client) sits above the Ollama process boundary.
2

A client decides what the model reasons over

For a coding CLI: stdin text. For OpenClaw: inbound messages from WhatsApp or Telegram. For a Mac agent: the accessibility tree.

The choice of observation format is a client-side decision, not an Ollama-side one. Ollama does not ship a perception layer.
3

The model emits a tool_use block

For Llama 3.1 / DeepSeek-R1 / Gemma 4 / Qwen, this is native tool-call mode. JSON with a tool name and arguments.

Ollama's job is done at this point: tokens out. Where those tokens go next is a client concern.
4

An action binary translates the tool call into OS events

In Fazm this is mcp-server-macos-use: the six _and_traverse tools wrap AXPress, CGEvent mouse, CGEvent keyboard, and Core Graphics scroll.

Registration: acp-bridge/src/index.ts:1057-1064 with args: [] and env: []. No model-specific flags.
5

The same binary re-walks the tree and returns it

The response in the MCP tool_result carries both the action result and the post-action accessibility tree.

Observe-act-observe collapses to one round trip. This property is a function of the MCP tool schema, not of Ollama or Anthropic.

Ollama v0.21.0 scope vs the Mac-desktop agent layer

A line-by-line accounting of what the April 17 release notes cover versus what the layer above Ollama has to answer on its own.

FeatureOllama v0.21.0 release notesFazm Mac-desktop agent layer
Scope of the release notesmodel-runtime layer: integrations, GPU, prompt templatesapplication-above-runtime layer: perception + action on macOS
What v0.21.0 sees on the screennothing; its surface is stdin/stdout and HTTP441 elements from AXUIElementCreateApplication, returned as text
How v0.21.0 clicks a button in Mailit does not; Ollama has no click primitivemcp-server-macos-use synthesizes a CGEvent click by role + title
Provider coupling in the binaryn/a; Ollama is the providerzero: registered with args: [] and env: [] at index.ts:1057-1064
Single-point swap to try a new backendollama pull <new-model-tag>DEFAULT_MODEL string at acp-bridge/src/index.ts:1245
Observation payload per stepwhatever the calling client chose to includea few kilobytes of UTF-8 tree text per tool response
Round trip shapechat completion request -> chat completion responseMCP tool_use -> action + re-walked tree in one tool_result
Setup for a non-developerinstall Ollama, pull a model, learn a CLIinstall Fazm, grant Accessibility once

The numbers on the Mac-side boundary

These are not benchmarks. They come from the file system and a real traversal of a Fazm Dev window, not from a blog post.

0 MBmcp-server-macos-use binary size
0Elements in a real AX tree walk
0sWalk + serialize time
0_and_traverse tools the binary exposes

Compare against a base64-encoded 4K screenshot observation, which is typically 0 KB to 0 MB of text in the request body. On a consumer Mac running a 7B Ollama model at 32K context, that is the gap between one step and a hundred.

One line of what any model on Ollama would read

The binary emits one text line per element. Role, title, frame, visibility. This is the exact format that lands in the MCP tool_result. A model served by Ollama would substring-search for the word it wants and read the x/y/w/h off the same line:

[AXButton (button)] "Send" x:6272 y:-1754 w:56 h:28 visible

Nothing about this line is Claude-specific. A Llama 3.1 or Gemma 4 or DeepSeek-R1 tool-call-capable model served by Ollama on localhost:11434 would read the same UTF-8 text.

How to read future Ollama release notes on a Mac

Every Ollama release will grow the set of clients that can speak to the local model. Expect more entries like Copilot CLI, Hermes, and OpenClaw. Expect incremental model-quality fixes on Gemma, Llama, DeepSeek, Qwen, Kimi, and gpt-oss variants. Expect occasional Apple Silicon specific patches (Metal compiler errors, cross-compile guards).

What you should not expect, because it is not what Ollama ships, is a line item that makes the local model click inside Mail, type into Notes, or pick an option in System Settings. Those capabilities live one layer up, in the code that hosts Ollama as a backend. The release-day question for Mac users is not "did Ollama grow", it is "did the client that hosts Ollama grow".

Fazm's answer, as of today, is that the client hosts Anthropic Claude, not Ollama. But the action layer beneath the client, registered at acp-bridge/src/index.ts lines 1057 to 1064, is already the layer an Ollama-hosted client would use verbatim. That is the release-note-independent part of the stack.

Want to see the Mac-desktop boundary above Ollama running live?

Thirty minutes on a call. We open acp-bridge/src/index.ts at line 1057, point at the mcp-server-macos-use binary, and run a workflow end-to-end.

Book a call

Frequently asked questions

What is the latest Ollama release in April 2026?

v0.21.0, published on April 17, 2026. The headline additions are three integrations inside `ollama launch` (Hermes, GitHub Copilot CLI, and OpenClaw multi-channel) plus a GPU change (flash attention enabled for Gemma 4 on compatible hardware). The headline fixes are the Gemma 4 nothink renderer restored with the e2b-style prompt, a Gemma 4 Metal compiler error resolved, macOS cross-compiles no longer triggering `generate` during cmake, and `ollama launch` no longer rewriting config files when nothing changed. Everything in the release notes is either a new inbound integration, a model-layer quality improvement, or a developer-ergonomics fix.

Does v0.21.0 add a way for Ollama to click inside Mac apps?

No. None of the additions in v0.21.0 touch Mac-desktop action. OpenClaw routes messages from WhatsApp, Telegram, and Discord into the local model's context; that is an input channel over stdio and HTTP, not a GUI action layer. The Copilot CLI integration lets a coding CLI consume the local model; that is a stdin/stdout surface, not a desktop automation surface. Ollama's boundary stops at text tokens in and text tokens out (plus tool-call JSON for models that support tool use). Everything above that boundary, including every pixel and every CGEvent on a Mac screen, lives outside the scope Ollama ships.

If Ollama's scope stops at tokens, where does the Mac-desktop action boundary actually live in shipping code?

In Fazm's case it lives in a 21 MB ARM64 Mach-O at Fazm.app/Contents/MacOS/mcp-server-macos-use. The ACP bridge (Node process) registers it as a local MCP server. The registration block is in acp-bridge/src/index.ts at lines 1057 to 1064: a single `existsSync(macosUseBinary)` guard, then `servers.push({ name: "macos-use", command: macosUseBinary, args: [], env: [] })`. Zero provider-specific arguments. Zero Anthropic-specific environment variables. The binary speaks MCP over stdio and exposes six tools, all suffixed `_and_traverse`, that walk the frontmost app's accessibility tree via `AXUIElementCreateApplication(pid)` and return the re-walked tree in the same response as the action result.

What would it take for a local Ollama model to drive the same Mac binary?

At the code level, it is a single-line swap: `DEFAULT_MODEL` at acp-bridge/src/index.ts line 1245 points at `claude-sonnet-4-6` today, alongside `SONNET_MODEL` on line 1246. The deeper lift is an inference-loop adapter that speaks Ollama's OpenAI-compatible `POST /v1/chat/completions` on `http://localhost:11434` instead of Anthropic's `POST /v1/messages`, and translates the tool-call JSON shape between the two. That adapter would sit where `ClaudeAcpAgent` sits in the ACP SDK today. The perception and action primitives, the `mcp-server-macos-use` binary and its six `_and_traverse` tools, would not need to change at all, because they were registered with `args: []` and `env: []` on purpose.

Which of the v0.21.0 release-note items matter most for Mac users running Ollama locally?

The macOS cross-compile fix (cmake builds no longer triggering `generate` on some Xcode versions) and the Gemma 4 Metal compiler fix are the two changes that directly affect the Apple Silicon install path. The Gemma 4 flash attention switch on compatible GPUs is mostly a Linux/NVIDIA win in practice, because Apple Silicon's unified memory architecture runs Gemma 4 through Metal Performance Shaders rather than through the flash-attention CUDA kernel. The Copilot CLI and OpenClaw additions are cross-platform. If you are on a Mac, the first two fixes are what quietly unbroke your local `ollama run gemma4` if it regressed earlier in April.

How does the April 2026 Ollama release compare against Ollama's 2025 trajectory?

The direction of travel is consistent: Ollama is expanding horizontally into adjacent runtimes and channels rather than vertically into the application layer. 2025 added the OpenAI-compatible REST surface on `localhost:11434`, Modelfile improvements, and GGUF coverage across Llama, DeepSeek-R1, Qwen, Kimi, gpt-oss, and Gemma. 2026 so far has added `ollama launch` as a one-command config, multi-channel (OpenClaw), and coding-agent integrations (Copilot CLI, Hermes). That scope is deliberate: the project is the model-runtime layer, and it stays there even as its integration surface grows.

Is there a version where Ollama's release notes will directly affect the Fazm codebase?

The interesting trigger is not a specific version number, it is a shape change: if Ollama exposes an Anthropic-compatible /v1/messages endpoint in addition to the OpenAI-compatible /v1/chat/completions endpoint, then the ACP SDK Fazm wraps today would become a drop-in client for local Ollama models with no adapter work. Until then, the boundary where a future Fazm release would integrate is the ACP SDK, not the MCP server. The MCP server is already provider-agnostic because its registration at acp-bridge/src/index.ts lines 1057-1064 never contained a provider-specific flag in the first place.

What does the accessibility-tree format actually look like that any Ollama model would consume?

One line per element. Each line carries the AX role, the accessible title, the CGFloat frame, and a visibility flag. A real line from a Fazm Dev window looks like: `[AXButton (button)] "Send" x:6272 y:-1754 w:56 h:28 visible`. A full window traversal is about 441 elements and completes in roughly 0.72 seconds. The model substring-searches for the word it wants (`Send`, `Reply`, `Compose`), reads the x/y/w/h off the same line, and passes those values as the arguments to the next `_and_traverse` tool call. Nothing about that format depends on the model being Claude. A Gemma 4 or Llama 3.1 instruct model running on Ollama would read the same text.

What stays out of scope for Ollama release notes that you still need to ship a Mac agent?

Two layers. A perception layer, which turns what is on the screen into tokens the model can reason over, and an action layer, which translates the model's tool-call JSON back into real CGEvent mouse/keyboard events. Fazm ships both as the `mcp-server-macos-use` binary: perception is the AX tree walk (kAXRole, kAXTitle, kAXFrame), action is CGEvent click, type, and scroll. Those primitives are what a Mac agent needs regardless of whether the model was served by Anthropic, OpenAI, Google, or a local Ollama endpoint. Ollama's release notes, by design, never include a layer above the token boundary.

Where can I inspect the Fazm facts this guide cites?

All three anchor points are in one file: acp-bridge/src/index.ts inside the Fazm desktop source tree. Line 63 resolves `macosUseBinary` to `Fazm.app/Contents/MacOS/mcp-server-macos-use`. Lines 1057 through 1064 are the `existsSync` guard plus the `servers.push({ name: "macos-use", command: macosUseBinary, args: [], env: [] })` registration. Line 1245 declares `DEFAULT_MODEL = "claude-sonnet-4-6"` and line 1246 aliases it as `SONNET_MODEL`. Line 1266 is the `BUILTIN_MCP_NAMES` set, which contains exactly five entries: `fazm_tools`, `playwright`, `macos-use`, `whatsapp`, `google-workspace`. For end-user verification, right-click Fazm.app, Show Package Contents, open Contents/MacOS, and run `file mcp-server-macos-use`; it reports `Mach-O 64-bit executable arm64`.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.