OLLAMA v0.15.5 → v0.23.1 / FEB → MAY 2026

Ollama's 2026 changelog, every version, plus the field that turns localhost:11434 into a Mac agent

Twenty-five-plus point releases between February 3 and May 5, 2026. The official release notes do a great job on ollama launch integrations, MLX speedups, and the full Gemma 4 family. Every other Ollama release-notes write-up stops there. This page keeps going: the Anthropic-shape shim that lets Ollama's OpenAI-compatible endpoint become the brain of a Mac agent that uses the macOS accessibility tree to drive any app you have open, not just the ones in the launch ecosystem.

M
Matthew Diakonov
13 min read
4.9from Sourced from github.com/ollama/ollama/releases and the Fazm source tree
v0.20.0 (Apr 2): full Gemma 4 family
v0.21.0 (Apr 16): MLX Gemma 4 mixed-precision
v0.23.0 (May 3): Claude Desktop launch support
v0.23.1 (May 5): 2x Gemma 4 31B speculative decode
ACPBridge.swift line 2045 names Ollama explicitly

DIRECT ANSWER · VERIFIED 2026-05-06

What shipped in Ollama 2026, in one paragraph

Between February 3 and May 5, 2026, Ollama shipped 0+ point releases across five minor versions (v0.15 → v0.23). The five themes: the ollama launch integration command (debut v0.16.0, Feb 12); the MLX runner for Apple Silicon (v0.17.5+, expanded in v0.19.0 and v0.21.0); the full Gemma 4 family of E2B / E4B / 26B / 31B (v0.20.0, April 2); Claude Desktop integration via ollama launch claude-desktop (v0.23.0, May 3); and Gemma 4 MTP speculative decoding on Apple Silicon for an over-2x speedup on the 31B coding model (v0.23.1, May 5). Source: github.com/ollama/ollama/releases.

THE COMPLETE 2026 RELEASE TABLE

Every shipped version, in order

Drawn from the official ollama/ollama GitHub releases page on May 6, 2026. Twenty-two rows, the headline change in each, and one note on why it matters for Mac users in particular.

VersionDateHeadlineWhy it matters
v0.23.1May 5, 2026Gemma 4 MTP speculative decoding on Apple SiliconOver 2x speedup on the Gemma 4 31B coding model. MLX runner threading fixes, Go bumped to 1.26.
v0.23.0May 3, 2026Claude Desktop support via `ollama launch claude-desktop`Claude Cowork and Claude Code integrations land in the launch ecosystem. Metal init hardened.
v0.22.1April 28, 2026Gemma 4 renderer thinking and tool-calling fixesModel recommendations update independently. Desktop launch page aligns with CLI integrations.
v0.22.0April 28, 2026NVIDIA Nemotron 3 Omni and Poolside Laguna XS.2Two new model integrations, including Poolside's coding model.
v0.21.3April 24, 2026API accepts 'max' as a think valueOpenAI response mapping for reasoning effort aligned to the think parameter.
v0.21.2April 23, 2026OpenClaw onboarding reliability in `ollama launch`Standardized model recommendation ordering. Web search plugin bundled in OpenClaw.
v0.21.1April 22, 2026Kimi CLI integration via `ollama launch kimi`MLX runner gained logprobs, fused top-P/top-K sampling, better tokenization, thread safety.
v0.21.0April 16, 2026Hermes Agent integration via `ollama launch hermes`Gemma 4 on MLX for Apple Silicon with mixed-precision quantization, expanded operators.
v0.20.7April 13, 2026Gemma quality fix when thinking disabledROCm bumped to 7.2.1 on Linux.
v0.20.6April 12, 2026Enhanced Gemma 4 tool callingImproved parallel tool calling for streaming responses.
v0.20.5April 9, 2026OpenClaw channel setup for messaging platformsFlash attention enabled for Gemma 4 on compatible GPUs. `/save` fix for safetensors imports.
v0.20.4April 7, 2026MLX M5 perf improvements using NAXGemma 4 flash-attention enablement.
v0.20.0April 2, 2026Full Gemma 4 family (E2B, E4B, 26B, 31B)The marquee model release of the spring. Tool use, thinking traces, multimodal in.
v0.19.0March 27, 2026MLX framework integration for Apple SiliconWeb search plugin in `ollama launch pi`. Better KV cache hit rates. Qwen 3.5 tool-call parsing fixed.
v0.18.3March 25, 2026Visual Studio Code integration via GitHub CopilotGLM tool-call parser improvements. OpenClaw gateway checks.
v0.18.0March 14, 2026Cloud-model perf, Kimi-K2.5 up to 2x, Nemotron-3-Super 122BOllama becomes an authentication provider in OpenClaw.
v0.17.5March 2, 2026Qwen 3.5 (0.8B to 35B) and MLX engine memory fixesFirst Qwen 3.5 lineup support on Apple Silicon-tuned MLX.
v0.17.0February 21, 2026Tokenizer perf, VRAM-aware default context lengthmacOS and Windows apps default context length to available VRAM.
v0.16.3February 19, 2026Cline CLI integration via `ollama launch cline`Model picker aligned with launch command. Extended MLX runner architecture support.
v0.16.2February 14, 2026OLLAMA_NO_CLOUD=1 privacy toggleDisables cloud-model surface in the app. Image-gen timeout config.
v0.16.0February 12, 2026Introduced `ollama launch` command and Pi launcherGLM-5 and MiniMax-M2.5 added. Ctrl+G text editor for prompts.
v0.15.5February 3, 2026Qwen3-Coder-Next, GLM-OCR, sub-agent planningDynamic context length defaults based on VRAM levels. Token prediction bug fixes.

One row per release. The Ollama team also tags release candidates (e.g. v0.19.1rc0, v0.22.0-rc11) that are not on this list because their changes roll up into the matching stable.

THE FIVE THEMES OF THE YEAR

Stop reading rows. Read the arc.

The version table is good for lookup. The pattern across the year is what tells you where Ollama is going.

1

Theme 1: `ollama launch` becomes the platform

Debuted in v0.16.0 on February 12 as a way to run Pi. By May, the same verb runs Hermes, Kimi CLI, Cline, OpenClaw, GitHub Copilot CLI, opencode, and Claude Desktop. Roughly half of the 2026 releases carry an `ollama launch` line. Ollama is positioning itself as the runtime other tools attach to, not just a model server.

2

Theme 2: MLX is now first-class on Apple Silicon

v0.17.5 on March 2 fixed initial MLX engine memory issues. v0.19.0 on March 27 integrated the MLX framework. v0.21.0 on April 16 added Gemma 4 on MLX with mixed-precision quantization. v0.23.1 on May 5 added Gemma 4 MTP speculative decoding for over 2x speed on the 31B coding model. If you run Ollama on a Mac, the throughput story today is unrecognizable from January.

3

Theme 3: Gemma 4 dominates the model timeline

v0.20.0 on April 2 shipped the full family: E2B, E4B, 26B, 31B. v0.20.6 (April 12) and v0.22.1 (April 28) tuned tool calling and the renderer. v0.20.5 (April 9) and v0.21.0 (April 16) added flash attention paths. Five releases in six weeks all touched Gemma 4. It is the model the team is optimizing the runtime around.

4

Theme 4: Cloud and privacy dial in opposite directions

v0.16.2 on February 14 added OLLAMA_NO_CLOUD=1 to disable cloud-model surface in the app. v0.18.0 on March 14 made cloud models 2x faster for Kimi-K2.5. The runtime is supporting both ends: full local with no leakage, and managed cloud. Mac users who want to verify nothing phones home now have one env var.

5

Theme 5: Tool-calling fidelity improves quietly

Almost every release in 2026 carries a tool-calling fix: v0.17.5, v0.17.6, v0.18.3, v0.20.1, v0.20.6, v0.20.7, v0.21.3, v0.22.1. None individually is dramatic. Together they shift the floor: agent loops that previously demanded Claude Sonnet for tool reliability now work with Gemma 4 26B, Qwen 3.5, and Kimi-K2.5 served locally.

THE GAP IN EVERY OTHER WRITE-UP

Every Ollama release notes page stops at ollama serve. This one keeps going.

Once you have a model loaded and Ollama is listening on http://localhost:11434, you have an OpenAI-compatible chat endpoint. Most pages stop there because Ollama's job is done. The harder question is: what runs on your Mac, on top of that endpoint, that turns a chat completion into actual clicks and keystrokes inside Mail, Finder, Notes, Numbers, Slack, or any other native app you have open?

That layer has to read app state as structured data (not pixels), expose a tool-calling surface the model can drive, and let you swap the model endpoint without recompiling. Almost nothing in the consumer space does all three. Fazm is the one I know about. It is MIT-licensed, reads the macOS accessibility tree directly, and its model endpoint is one TextField in Settings that maps cleanly onto ANTHROPIC_BASE_URL.

The rest of this page is the half of the story Ollama's release notes do not tell.

line 2045

When the user has a Custom API Endpoint configured (LM Studio, Ollama, corporate proxy, etc.)...

Desktop/Sources/Chat/ACPBridge.swift, fazm repo

THE ANCHOR FACT

The exact place in the Fazm source that names Ollama by name

Most Mac agents do not name the local-LLM runtimes they support in their public marketing because the support is fragile. Fazm has it in a code comment in the friendly-error path, which is a stronger signal than a marketing page: someone has shipped Ollama as a target and trapped its specific failure modes.

Desktop/Sources/Chat/ACPBridge.swift (lines 2045-2058)
// When the user has a Custom API Endpoint configured (LM Studio, Ollama, corporate proxy, etc.),
// raw upstream errors like `API Error: 400 ... "No models loaded ... use the 'lms load' command"`
// are confusing - users blame Fazm for an error that's coming from their local server.
// Detect known custom-endpoint failures and surface an actionable message instead.
let endpoint = UserDefaults.standard.string(forKey: "customApiEndpoint") ?? ""
if !endpoint.isEmpty {
  let lower = cleaned.lowercased()
  if lower.contains("no models loaded") || lower.contains("lms load") {
    return "Your custom API endpoint (\(endpoint)) reported no model is loaded. ..."
  }
  if lower.contains("api error") || lower.contains("connection refused") || lower.contains("econnrefused") {
    return "\(cleaned)\n\nThis came from your custom API endpoint (\(endpoint))..."
  }
}

The actual env-var assignment that makes the bridge work is fourteen lines, eighty characters total, in the same file.

Desktop/Sources/Chat/ACPBridge.swift (lines 467-470)
// Custom API endpoint (allows proxying through Copilot, corporate gateways, etc.)
if let customEndpoint = defaults.string(forKey: "customApiEndpoint"), !customEndpoint.isEmpty {
  env["ANTHROPIC_BASE_URL"] = customEndpoint
}

The Settings UI the user actually touches is in SettingsPage.swift around line 962. The placeholder text in the TextField is a real string from the binary.

Desktop/Sources/MainWindow/Pages/SettingsPage.swift (excerpt)
@AppStorage("customApiEndpoint") private var customApiEndpoint: String = ""

Text("Custom API Endpoint")
TextField("http://localhost:4000", text: $customApiEndpoint)
  .onSubmit {
    Task { await chatProvider?.restartBridgeForEndpointChange() }
  }

Text("Route API calls through a custom endpoint (e.g. local LLM bridge, corporate proxy, or GitHub Copilot bridge). Leave empty to use the default Anthropic API.")

From ollama serve to a Mac agent driving apps

The hops are short. Ollama listens on localhost:11434, an Anthropic-shaped shim translates the API shape, Fazm reads the shim URL out of UserDefaults, the Node bridge gets it as an environment variable, and the model talks to your apps via AXUIElement traversal.

One turn: Fazm → shim → Ollama → shim → Fazm

Fazm UIACP bridge (Node)ShimOllama :11434Mac appuser prompt + AX tree snapshotPOST /v1/messages (Anthropic shape)POST /v1/chat/completions (OpenAI shape)chat completion + tool callsmessages response + tool_use blocksexecute AX action: click / typeupdated AX subtreestream answer + next tool

The whole setup, in terminal form

No mockup. This is the real sequence on a Mac running Ollama v0.23.1 and Fazm built from main.

ollama + litellm + fazm, first run

Quickstart, four steps

From `ollama pull` to Fazm driving apps

  1. 1

    Pull a model in Ollama v0.23.1

    ollama pull gemma4:31b. The 31B is the one that benefits from v0.23.1's MTP speculative decode on Apple Silicon.

  2. 2

    Start an Anthropic-shaped shim

    LiteLLM in proxy mode, claude-code-router, or a small FastAPI bridge. Point upstream at http://localhost:11434/v1.

  3. 3

    Install Fazm, grant Accessibility

    Download from fazm.ai. Grant the macOS Accessibility permission when macOS prompts. Fazm needs it to read AX trees.

  4. 4

    Paste the shim URL in Custom API Endpoint

    Settings → Advanced → AI Chat → Custom API Endpoint. Toggle on, paste, hit return. Bridge respawns automatically.

The shims that work in front of Ollama

Fazm speaks Anthropic Messages. Ollama's OpenAI-compatible surface speaks OpenAI chat completions. The shim is the middle. None of these ship with Fazm or with Ollama. They are independent projects, each one presenting an Anthropic-shaped endpoint and routing to whatever you put behind it, including Ollama on localhost:11434.

LiteLLM (Anthropic proxy mode)
claude-code-router
OpenRouter (Anthropic endpoint)
Self-hosted FastAPI bridge
Cloudflare Worker shim
claudette
anyscale-anthropic-shim
Corporate gateway with anthropic-compat

Test the shim with your real tasks, not a benchmark. Streaming, parallel tool calls, and large tool results are where shim quality diverges, and they are exactly what matters in an agent loop.

Why feed the model the AX tree, not a screenshot

Ollama's 2026 multimodal models can take screenshots. You should not, if you have a choice. The same UI state already exists as a typed tree of roles, titles, values, and positions via the macOS Accessibility API. Fazm reads that tree directly, which saves your local tokens for reasoning, not OCR.

Screenshot substrate vs. accessibility-tree substrate

A 2880x1800 screenshot becomes hundreds of kilobytes of base64. The model pays tokens to OCR 'Send' on a button. It also has to infer click coordinates in pixel space. A UI refresh, a font change, or a sidebar resize breaks everything.

  • Lossy raster, hundreds of KB
  • Model wastes context on OCR
  • Pixel-space coordinate guessing
  • Breaks on UI redesign or zoom

What this stack does and does not do today

  • Driving Mail, Notes, Calendar, Numbers, Pages via the AX tree
  • Driving any third-party Mac app that exposes accessibility
  • Running fully offline once the Ollama model is pulled
  • Swapping the model from Gemma 4 31B to Qwen 3.5 35B with no recompile
  • Tool-call quality matching frontier closed models on long agent runs
  • Driving apps that do not expose accessibility (rare on macOS, common on Electron with broken AX)
  • Native /v1/messages support directly in Ollama (planned by community, not shipped)
The Ollama release notes are the most useful changelog I read each month, and they still leave out the substrate question. Once you have a model serving on 11434, what's actually clicking buttons on your laptop? That part has been missing for two years.
M
Matthew Diakonov
Building Fazm at mediar.ai

Running Ollama and want it to drive your Mac apps?

Walk through the shim choice, the Custom API Endpoint field, and which 2026 Ollama release fits your hardware. Fifteen minutes, no slides.

Frequently asked questions

What did Ollama actually ship in 2026 so far?

Between February 3 and May 5, 2026, Ollama shipped 25+ point releases across five minor versions. v0.15.5 (Feb 3) added Qwen3-Coder-Next, GLM-OCR, and sub-agent planning. v0.16.0 (Feb 12) introduced the `ollama launch` command and Pi launcher. v0.17.5 (March 2) added Qwen 3.5 in 0.8B-35B sizes and fixed MLX engine memory issues. v0.18.0 (March 14) shipped Nemotron-3-Super 122B and 2x faster Kimi-K2.5. v0.19.0 (March 27) integrated MLX framework for Apple Silicon and added a web search plugin. v0.20.0 (April 2) added the full Gemma 4 family (E2B, E4B, 26B, 31B). v0.21.0 (April 16) added Hermes Agent integration. v0.22.0 (April 28) added NVIDIA Nemotron 3 Omni and Poolside Laguna XS.2. v0.23.0 (May 3) added Claude Desktop support via `ollama launch claude-desktop`. v0.23.1 (May 5) added Gemma 4 MTP speculative decoding for a 2x speedup on the 31B coding model on Apple Silicon.

What is the biggest theme in the 2026 Ollama changelog?

The `ollama launch` ecosystem. It first appeared in v0.16.0 in February as a way to run Pi. By May, the same command launches Hermes Agent, Kimi CLI, Cline, OpenClaw, GitHub Copilot CLI, opencode, and Claude Desktop. Roughly half of the 2026 release notes carry an `ollama launch <something>` line. Ollama is positioning itself as the local-first runtime that other tools can attach to, not just a model server you `curl` against. That is a strategic shift from 2025, where `ollama serve` was the whole story.

Can I use Ollama as the model backend for a Mac agent like Fazm?

Yes, with one caveat. Fazm's chat engine speaks the Anthropic Messages API shape, and Ollama's compatible API is OpenAI-shaped at /v1/chat/completions on port 11434. You need an Anthropic-to-OpenAI shim between them. LiteLLM in Anthropic-proxy mode, claude-code-router, or a small custom FastAPI bridge all work. Point the shim at http://localhost:11434/v1, then paste the shim URL into Fazm's Custom API Endpoint setting. Fazm's ACPBridge.swift at lines 468 and 469 reads that value from UserDefaults and exports it as ANTHROPIC_BASE_URL on the Node subprocess it spawns. No recompile, no fork. The whole switch is one TextField in Settings.

Where does Fazm's source code mention Ollama by name?

ACPBridge.swift line 2045 has a literal comment that reads: 'When the user has a Custom API Endpoint configured (LM Studio, Ollama, corporate proxy, etc.), raw upstream errors like API Error: 400 ... No models loaded ... use the lms load command are confusing.' That comment introduces the friendly-error path at lines 2049 to 2058 that catches connection-refused and no-models-loaded errors from local servers and rewrites them into actionable messages. Ollama is named explicitly because the team has tested this end-to-end. The same file at lines 468 to 469 contains the actual env-var assignment that makes the bridge work.

Which 2026 Ollama release matters most for desktop agent workflows?

v0.23.1 on May 5, for one specific reason: Gemma 4 MTP speculative decoding on Apple Silicon, with Ollama's own release note quoting 'over a 2x speed increase for the Gemma 4 31B model on coding tasks.' Agent loops do a lot of structured-output and tool-call generation, which is exactly the workload that benefits most from speculative decoding. v0.21.0 in mid-April matters too, because that is where MLX got Gemma 4 support with mixed-precision quantization. If you are running Ollama on a 64GB M-series Mac and pointing Fazm at it, those two releases doubled your throughput.

Does this work with Ollama running fully offline?

Yes. Ollama's `ollama serve` listens on http://localhost:11434 by default and does not require network access for inference once the model is pulled. The Anthropic-to-OpenAI shim can also run on localhost (LiteLLM defaults to :4000). Fazm itself does not require network access for the bridge: paste http://localhost:4000 into Custom API Endpoint and the entire request path is a unix-domain hop on your laptop. The OLLAMA_NO_CLOUD=1 environment variable, added in v0.16.2 on February 14, 2026, also disables Ollama's cloud-model surface in the app, which matters if your security team flags any background calls to ollama.com.

Why does the official Ollama changelog skip the agent half?

Because Ollama is a runtime, and its audience is people running local models. The release notes are written for someone who has just typed `ollama pull <model>` and wants to know what changed. They cover: new models, new flags, performance, integrations launched via `ollama launch`. They do not cover the substrate question: once you have an OpenAI-compatible endpoint serving on localhost:11434, what software on your Mac actually turns that endpoint into clicks, keystrokes, and document edits inside Mail, Finder, Chrome, Notes, Numbers, or any other app you use? That is a consumer-side problem, and it is exactly what the Custom API Endpoint mechanism in Fazm exists to solve.

Can Ollama itself drive Mac apps via `ollama launch`?

No, and that is an important distinction. `ollama launch hermes`, `ollama launch claude-desktop`, `ollama launch opencode`, and similar commands launch other apps and configure them to use Ollama as their model backend. The launched app is responsible for any UI work it does. Ollama is a backend; the launched apps are clients. That arrangement covers chat-style and code-editor-style clients well, but it does not give you an agent that can drive arbitrary Mac apps you did not write yourself, because every launch target has to opt in. Fazm sits in the other place: it is itself a launched-by-Ollama-style client, and once attached, it drives every Mac app via the macOS accessibility tree, regardless of whether each app has integrated with Ollama.

What about CVEs or security issues in Ollama 2026?

The 2026 Ollama release stream has been release-note-clean on critical CVEs through v0.23.1 (May 5). The closest item is the OLLAMA_NO_CLOUD=1 setting added in v0.16.2 (Feb 14), which is a privacy-control toggle rather than a vulnerability fix. If you operate Ollama in a multi-user environment, the longstanding rule still applies: do not expose port 11434 directly to the internet, sit it behind a gateway that authenticates and rate-limits. The OpenAI-compat endpoint accepts arbitrary prompts and arbitrary model names with no auth in the default config.

What should I read alongside the Ollama 2026 release notes?

The github.com/ollama/ollama/releases page for the raw changelog and commit lists. The Ollama blog at ollama.com/blog for design rationale. Fazm's source tree at github.com/mediar-ai/fazm to see the consumer-side substrate that turns Ollama's endpoint into a Mac agent, in particular ACPBridge.swift around lines 468 and 2045. And LiteLLM's Anthropic-proxy docs if you want the cleanest shim option between Anthropic-shape clients and Ollama's OpenAI-shape endpoint.