OLLAMA v0.15.5 → v0.23.1 / FEB → MAY 2026

Ollama's 2026 changelog, every version, plus the field that turns localhost:11434 into a Mac agent

Twenty-five-plus point releases between February 3 and May 5, 2026. The official release notes do a great job on ollama launch integrations, MLX speedups, and the full Gemma 4 family. Every other Ollama release-notes write-up stops there. This page keeps going: the Anthropic-shape shim that lets Ollama's OpenAI-compatible endpoint become the brain of a Mac agent that uses the macOS accessibility tree to drive any app you have open, not just the ones in the launch ecosystem.

Matthew Diakonov, Written with AI

Published May 6, 202613 min read

4.9from Sourced from github.com/ollama/ollama/releases and the Fazm source tree

v0.20.0 (Apr 2): full Gemma 4 family

v0.21.0 (Apr 16): MLX Gemma 4 mixed-precision

v0.23.0 (May 3): Claude Desktop launch support

v0.23.1 (May 5): 2x Gemma 4 31B speculative decode

ACPBridge.swift line 2045 names Ollama explicitly

DIRECT ANSWER · VERIFIED 2026-05-06

What shipped in Ollama 2026, in one paragraph

Between February 3 and May 5, 2026, Ollama shipped 0+ point releases across five minor versions (v0.15 → v0.23). The five themes: the ollama launch integration command (debut v0.16.0, Feb 12); the MLX runner for Apple Silicon (v0.17.5+, expanded in v0.19.0 and v0.21.0); the full Gemma 4 family of E2B / E4B / 26B / 31B (v0.20.0, April 2); Claude Desktop integration via ollama launch claude-desktop (v0.23.0, May 3); and Gemma 4 MTP speculative decoding on Apple Silicon for an over-2x speedup on the 31B coding model (v0.23.1, May 5). Source: github.com/ollama/ollama/releases.

Ollama 2026, in five releases

The shipped versions, and the boundary they still sit below

v0.16.0: ollama launch lands

v0.20.0: Gemma 4 family

v0.23.0: Claude Desktop support

v0.23.1: 2x Gemma 4 31B coding

Custom API Endpoint closes the loop

0:00 / 0:05

THE COMPLETE 2026 RELEASE TABLE

Every shipped version, in order

Drawn from the official ollama/ollama GitHub releases page on May 6, 2026. Twenty-two rows, the headline change in each, and one note on why it matters for Mac users in particular.

Version	Date	Headline	Why it matters
v0.23.1	May 5, 2026	Gemma 4 MTP speculative decoding on Apple Silicon	Over 2x speedup on the Gemma 4 31B coding model. MLX runner threading fixes, Go bumped to 1.26.
v0.23.0	May 3, 2026	Claude Desktop support via `ollama launch claude-desktop`	Claude Cowork and Claude Code integrations land in the launch ecosystem. Metal init hardened.
v0.22.1	April 28, 2026	Gemma 4 renderer thinking and tool-calling fixes	Model recommendations update independently. Desktop launch page aligns with CLI integrations.
v0.22.0	April 28, 2026	NVIDIA Nemotron 3 Omni and Poolside Laguna XS.2	Two new model integrations, including Poolside's coding model.
v0.21.3	April 24, 2026	API accepts 'max' as a think value	OpenAI response mapping for reasoning effort aligned to the think parameter.
v0.21.2	April 23, 2026	OpenClaw onboarding reliability in `ollama launch`	Standardized model recommendation ordering. Web search plugin bundled in OpenClaw.
v0.21.1	April 22, 2026	Kimi CLI integration via `ollama launch kimi`	MLX runner gained logprobs, fused top-P/top-K sampling, better tokenization, thread safety.
v0.21.0	April 16, 2026	Hermes Agent integration via `ollama launch hermes`	Gemma 4 on MLX for Apple Silicon with mixed-precision quantization, expanded operators.
v0.20.7	April 13, 2026	Gemma quality fix when thinking disabled	ROCm bumped to 7.2.1 on Linux.
v0.20.6	April 12, 2026	Enhanced Gemma 4 tool calling	Improved parallel tool calling for streaming responses.
v0.20.5	April 9, 2026	OpenClaw channel setup for messaging platforms	Flash attention enabled for Gemma 4 on compatible GPUs. `/save` fix for safetensors imports.
v0.20.4	April 7, 2026	MLX M5 perf improvements using NAX	Gemma 4 flash-attention enablement.
v0.20.0	April 2, 2026	Full Gemma 4 family (E2B, E4B, 26B, 31B)	The marquee model release of the spring. Tool use, thinking traces, multimodal in.
v0.19.0	March 27, 2026	MLX framework integration for Apple Silicon	Web search plugin in `ollama launch pi`. Better KV cache hit rates. Qwen 3.5 tool-call parsing fixed.
v0.18.3	March 25, 2026	Visual Studio Code integration via GitHub Copilot	GLM tool-call parser improvements. OpenClaw gateway checks.
v0.18.0	March 14, 2026	Cloud-model perf, Kimi-K2.5 up to 2x, Nemotron-3-Super 122B	Ollama becomes an authentication provider in OpenClaw.
v0.17.5	March 2, 2026	Qwen 3.5 (0.8B to 35B) and MLX engine memory fixes	First Qwen 3.5 lineup support on Apple Silicon-tuned MLX.
v0.17.0	February 21, 2026	Tokenizer perf, VRAM-aware default context length	macOS and Windows apps default context length to available VRAM.
v0.16.3	February 19, 2026	Cline CLI integration via `ollama launch cline`	Model picker aligned with launch command. Extended MLX runner architecture support.
v0.16.2	February 14, 2026	OLLAMA_NO_CLOUD=1 privacy toggle	Disables cloud-model surface in the app. Image-gen timeout config.
v0.16.0	February 12, 2026	Introduced `ollama launch` command and Pi launcher	GLM-5 and MiniMax-M2.5 added. Ctrl+G text editor for prompts.
v0.15.5	February 3, 2026	Qwen3-Coder-Next, GLM-OCR, sub-agent planning	Dynamic context length defaults based on VRAM levels. Token prediction bug fixes.

One row per release. The Ollama team also tags release candidates (e.g. v0.19.1rc0, v0.22.0-rc11) that are not on this list because their changes roll up into the matching stable.

THE FIVE THEMES OF THE YEAR

Stop reading rows. Read the arc.

The version table is good for lookup. The pattern across the year is what tells you where Ollama is going.

Theme 1: `ollama launch` becomes the platform

Debuted in v0.16.0 on February 12 as a way to run Pi. By May, the same verb runs Hermes, Kimi CLI, Cline, OpenClaw, GitHub Copilot CLI, opencode, and Claude Desktop. Roughly half of the 2026 releases carry an `ollama launch` line. Ollama is positioning itself as the runtime other tools attach to, not just a model server.

Theme 2: MLX is now first-class on Apple Silicon

v0.17.5 on March 2 fixed initial MLX engine memory issues. v0.19.0 on March 27 integrated the MLX framework. v0.21.0 on April 16 added Gemma 4 on MLX with mixed-precision quantization. v0.23.1 on May 5 added Gemma 4 MTP speculative decoding for over 2x speed on the 31B coding model. If you run Ollama on a Mac, the throughput story today is unrecognizable from January.

Theme 3: Gemma 4 dominates the model timeline

v0.20.0 on April 2 shipped the full family: E2B, E4B, 26B, 31B. v0.20.6 (April 12) and v0.22.1 (April 28) tuned tool calling and the renderer. v0.20.5 (April 9) and v0.21.0 (April 16) added flash attention paths. Five releases in six weeks all touched Gemma 4. It is the model the team is optimizing the runtime around.

Theme 4: Cloud and privacy dial in opposite directions

v0.16.2 on February 14 added OLLAMA_NO_CLOUD=1 to disable cloud-model surface in the app. v0.18.0 on March 14 made cloud models 2x faster for Kimi-K2.5. The runtime is supporting both ends: full local with no leakage, and managed cloud. Mac users who want to verify nothing phones home now have one env var.

Theme 5: Tool-calling fidelity improves quietly

Almost every release in 2026 carries a tool-calling fix: v0.17.5, v0.17.6, v0.18.3, v0.20.1, v0.20.6, v0.20.7, v0.21.3, v0.22.1. None individually is dramatic. Together they shift the floor: agent loops that previously demanded Claude Sonnet for tool reliability now work with Gemma 4 26B, Qwen 3.5, and Kimi-K2.5 served locally.

THE GAP IN EVERY OTHER WRITE-UP

Every Ollama release notes page stops at `ollama serve`. This one keeps going.

Once you have a model loaded and Ollama is listening on http://localhost:11434, you have an OpenAI-compatible chat endpoint. Most pages stop there because Ollama's job is done. The harder question is: what runs on your Mac, on top of that endpoint, that turns a chat completion into actual clicks and keystrokes inside Mail, Finder, Notes, Numbers, Slack, or any other native app you have open?

That layer has to read app state as structured data (not pixels), expose a tool-calling surface the model can drive, and let you swap the model endpoint without recompiling. Almost nothing in the consumer space does all three. Fazm is the one I know about. It is MIT-licensed, reads the macOS accessibility tree directly, and its model endpoint is one TextField in Settings that maps cleanly onto ANTHROPIC_BASE_URL.

The rest of this page is the half of the story Ollama's release notes do not tell.

line 2045

“When the user has a Custom API Endpoint configured (LM Studio, Ollama, corporate proxy, etc.)...”

Desktop/Sources/Chat/ACPBridge.swift, fazm repo

THE ANCHOR FACT

The exact place in the Fazm source that names Ollama by name

Most Mac agents do not name the local-LLM runtimes they support in their public marketing because the support is fragile. Fazm has it in a code comment in the friendly-error path, which is a stronger signal than a marketing page: someone has shipped Ollama as a target and trapped its specific failure modes.

Desktop/Sources/Chat/ACPBridge.swift (lines 2045-2058)

// When the user has a Custom API Endpoint configured (LM Studio, Ollama, corporate proxy, etc.),
// raw upstream errors like `API Error: 400 ... "No models loaded ... use the 'lms load' command"`
// are confusing - users blame Fazm for an error that's coming from their local server.
// Detect known custom-endpoint failures and surface an actionable message instead.
let endpoint = UserDefaults.standard.string(forKey: "customApiEndpoint") ?? ""
if !endpoint.isEmpty {
  let lower = cleaned.lowercased()
  if lower.contains("no models loaded") || lower.contains("lms load") {
    return "Your custom API endpoint (\(endpoint)) reported no model is loaded. ..."
  }
  if lower.contains("api error") || lower.contains("connection refused") || lower.contains("econnrefused") {
    return "\(cleaned)\n\nThis came from your custom API endpoint (\(endpoint))..."
  }
}

The actual env-var assignment that makes the bridge work is fourteen lines, eighty characters total, in the same file.

Desktop/Sources/Chat/ACPBridge.swift (lines 467-470)

// Custom API endpoint (allows proxying through Copilot, corporate gateways, etc.)
if let customEndpoint = defaults.string(forKey: "customApiEndpoint"), !customEndpoint.isEmpty {
  env["ANTHROPIC_BASE_URL"] = customEndpoint
}

The Settings UI the user actually touches is in SettingsPage.swift around line 962. The placeholder text in the TextField is a real string from the binary.

Desktop/Sources/MainWindow/Pages/SettingsPage.swift (excerpt)

@AppStorage("customApiEndpoint") private var customApiEndpoint: String = ""

Text("Custom API Endpoint")
TextField("http://localhost:4000", text: $customApiEndpoint)
  .onSubmit {
    Task { await chatProvider?.restartBridgeForEndpointChange() }
  }

Text("Route API calls through a custom endpoint (e.g. local LLM bridge, corporate proxy, or GitHub Copilot bridge). Leave empty to use the default Anthropic API.")

From ollama serve to a Mac agent driving apps

The hops are short. Ollama listens on localhost:11434, an Anthropic-shaped shim translates the API shape, Fazm reads the shim URL out of UserDefaults, the Node bridge gets it as an environment variable, and the model talks to your apps via AXUIElement traversal.

One turn: Fazm → shim → Ollama → shim → Fazm

The whole setup, in terminal form

No mockup. This is the real sequence on a Mac running Ollama v0.23.1 and Fazm built from main.

ollama + litellm + fazm, first run

Quickstart, four steps

From `ollama pull` to Fazm driving apps

1
Pull a model in Ollama v0.23.1
ollama pull gemma4:31b. The 31B is the one that benefits from v0.23.1's MTP speculative decode on Apple Silicon.
2
Start an Anthropic-shaped shim
LiteLLM in proxy mode, claude-code-router, or a small FastAPI bridge. Point upstream at http://localhost:11434/v1.
3
Install Fazm, grant Accessibility
Download from fazm.ai. Grant the macOS Accessibility permission when macOS prompts. Fazm needs it to read AX trees.
4
Paste the shim URL in Custom API Endpoint
Settings → Advanced → AI Chat → Custom API Endpoint. Toggle on, paste, hit return. Bridge respawns automatically.

The shims that work in front of Ollama

Fazm speaks Anthropic Messages. Ollama's OpenAI-compatible surface speaks OpenAI chat completions. The shim is the middle. None of these ship with Fazm or with Ollama. They are independent projects, each one presenting an Anthropic-shaped endpoint and routing to whatever you put behind it, including Ollama on localhost:11434.

LiteLLM (Anthropic proxy mode)

claude-code-router

OpenRouter (Anthropic endpoint)

Self-hosted FastAPI bridge

Cloudflare Worker shim

claudette

anyscale-anthropic-shim

Corporate gateway with anthropic-compat

Test the shim with your real tasks, not a benchmark. Streaming, parallel tool calls, and large tool results are where shim quality diverges, and they are exactly what matters in an agent loop.

Why feed the model the AX tree, not a screenshot

Ollama's 2026 multimodal models can take screenshots. You should not, if you have a choice. The same UI state already exists as a typed tree of roles, titles, values, and positions via the macOS Accessibility API. Fazm reads that tree directly, which saves your local tokens for reasoning, not OCR.

Screenshot substrate vs. accessibility-tree substrate

A 2880x1800 screenshot becomes hundreds of kilobytes of base64. The model pays tokens to OCR 'Send' on a button. It also has to infer click coordinates in pixel space. A UI refresh, a font change, or a sidebar resize breaks everything.

Lossy raster, hundreds of KB
Model wastes context on OCR
Pixel-space coordinate guessing
Breaks on UI redesign or zoom

What this stack does and does not do today

Driving Mail, Notes, Calendar, Numbers, Pages via the AX tree
Driving any third-party Mac app that exposes accessibility
Running fully offline once the Ollama model is pulled
Swapping the model from Gemma 4 31B to Qwen 3.5 35B with no recompile
Tool-call quality matching frontier closed models on long agent runs
Driving apps that do not expose accessibility (rare on macOS, common on Electron with broken AX)
Native /v1/messages support directly in Ollama (planned by community, not shipped)

“The Ollama release notes are the most useful changelog I read each month, and they still leave out the substrate question. Once you have a model serving on 11434, what's actually clicking buttons on your laptop? That part has been missing for two years.”

Matthew Diakonov

Building Fazm at mediar.ai

Running Ollama and want it to drive your Mac apps?

Walk through the shim choice, the Custom API Endpoint field, and which 2026 Ollama release fits your hardware. Fifteen minutes, no slides.

Frequently asked questions

What did Ollama actually ship in 2026 so far?

Between February 3 and May 5, 2026, Ollama shipped 25+ point releases across five minor versions. v0.15.5 (Feb 3) added Qwen3-Coder-Next, GLM-OCR, and sub-agent planning. v0.16.0 (Feb 12) introduced the `ollama launch` command and Pi launcher. v0.17.5 (March 2) added Qwen 3.5 in 0.8B-35B sizes and fixed MLX engine memory issues. v0.18.0 (March 14) shipped Nemotron-3-Super 122B and 2x faster Kimi-K2.5. v0.19.0 (March 27) integrated MLX framework for Apple Silicon and added a web search plugin. v0.20.0 (April 2) added the full Gemma 4 family (E2B, E4B, 26B, 31B). v0.21.0 (April 16) added Hermes Agent integration. v0.22.0 (April 28) added NVIDIA Nemotron 3 Omni and Poolside Laguna XS.2. v0.23.0 (May 3) added Claude Desktop support via `ollama launch claude-desktop`. v0.23.1 (May 5) added Gemma 4 MTP speculative decoding for a 2x speedup on the 31B coding model on Apple Silicon.

What is the biggest theme in the 2026 Ollama changelog?

The `ollama launch` ecosystem. It first appeared in v0.16.0 in February as a way to run Pi. By May, the same command launches Hermes Agent, Kimi CLI, Cline, OpenClaw, GitHub Copilot CLI, opencode, and Claude Desktop. Roughly half of the 2026 release notes carry an `ollama launch <something>` line. Ollama is positioning itself as the local-first runtime that other tools can attach to, not just a model server you `curl` against. That is a strategic shift from 2025, where `ollama serve` was the whole story.

Can I use Ollama as the model backend for a Mac agent like Fazm?

Yes, with one caveat. Fazm's chat engine speaks the Anthropic Messages API shape, and Ollama's compatible API is OpenAI-shaped at /v1/chat/completions on port 11434. You need an Anthropic-to-OpenAI shim between them. LiteLLM in Anthropic-proxy mode, claude-code-router, or a small custom FastAPI bridge all work. Point the shim at http://localhost:11434/v1, then paste the shim URL into Fazm's Custom API Endpoint setting. Fazm's ACPBridge.swift at lines 468 and 469 reads that value from UserDefaults and exports it as ANTHROPIC_BASE_URL on the Node subprocess it spawns. No recompile, no fork. The whole switch is one TextField in Settings.

Where does Fazm's source code mention Ollama by name?

ACPBridge.swift line 2045 has a literal comment that reads: 'When the user has a Custom API Endpoint configured (LM Studio, Ollama, corporate proxy, etc.), raw upstream errors like API Error: 400 ... No models loaded ... use the lms load command are confusing.' That comment introduces the friendly-error path at lines 2049 to 2058 that catches connection-refused and no-models-loaded errors from local servers and rewrites them into actionable messages. Ollama is named explicitly because the team has tested this end-to-end. The same file at lines 468 to 469 contains the actual env-var assignment that makes the bridge work.

Which 2026 Ollama release matters most for desktop agent workflows?

v0.23.1 on May 5, for one specific reason: Gemma 4 MTP speculative decoding on Apple Silicon, with Ollama's own release note quoting 'over a 2x speed increase for the Gemma 4 31B model on coding tasks.' Agent loops do a lot of structured-output and tool-call generation, which is exactly the workload that benefits most from speculative decoding. v0.21.0 in mid-April matters too, because that is where MLX got Gemma 4 support with mixed-precision quantization. If you are running Ollama on a 64GB M-series Mac and pointing Fazm at it, those two releases doubled your throughput.

Does this work with Ollama running fully offline?

Yes. Ollama's `ollama serve` listens on http://localhost:11434 by default and does not require network access for inference once the model is pulled. The Anthropic-to-OpenAI shim can also run on localhost (LiteLLM defaults to :4000). Fazm itself does not require network access for the bridge: paste http://localhost:4000 into Custom API Endpoint and the entire request path is a unix-domain hop on your laptop. The OLLAMA_NO_CLOUD=1 environment variable, added in v0.16.2 on February 14, 2026, also disables Ollama's cloud-model surface in the app, which matters if your security team flags any background calls to ollama.com.

Why does the official Ollama changelog skip the agent half?

Because Ollama is a runtime, and its audience is people running local models. The release notes are written for someone who has just typed `ollama pull <model>` and wants to know what changed. They cover: new models, new flags, performance, integrations launched via `ollama launch`. They do not cover the substrate question: once you have an OpenAI-compatible endpoint serving on localhost:11434, what software on your Mac actually turns that endpoint into clicks, keystrokes, and document edits inside Mail, Finder, Chrome, Notes, Numbers, or any other app you use? That is a consumer-side problem, and it is exactly what the Custom API Endpoint mechanism in Fazm exists to solve.

Can Ollama itself drive Mac apps via `ollama launch`?

No, and that is an important distinction. `ollama launch hermes`, `ollama launch claude-desktop`, `ollama launch opencode`, and similar commands launch other apps and configure them to use Ollama as their model backend. The launched app is responsible for any UI work it does. Ollama is a backend; the launched apps are clients. That arrangement covers chat-style and code-editor-style clients well, but it does not give you an agent that can drive arbitrary Mac apps you did not write yourself, because every launch target has to opt in. Fazm sits in the other place: it is itself a launched-by-Ollama-style client, and once attached, it drives every Mac app via the macOS accessibility tree, regardless of whether each app has integrated with Ollama.

What about CVEs or security issues in Ollama 2026?

The 2026 Ollama release stream has been release-note-clean on critical CVEs through v0.23.1 (May 5). The closest item is the OLLAMA_NO_CLOUD=1 setting added in v0.16.2 (Feb 14), which is a privacy-control toggle rather than a vulnerability fix. If you operate Ollama in a multi-user environment, the longstanding rule still applies: do not expose port 11434 directly to the internet, sit it behind a gateway that authenticates and rate-limits. The OpenAI-compat endpoint accepts arbitrary prompts and arbitrary model names with no auth in the default config.

What should I read alongside the Ollama 2026 release notes?

The github.com/ollama/ollama/releases page for the raw changelog and commit lists. The Ollama blog at ollama.com/blog for design rationale. Fazm's source tree at github.com/mediar-ai/fazm to see the consumer-side substrate that turns Ollama's endpoint into a Mac agent, in particular ACPBridge.swift around lines 468 and 2045. And LiteLLM's Anthropic-proxy docs if you want the cleanest shim option between Anthropic-shape clients and Ollama's OpenAI-shape endpoint.

Keep reading

Release notes

vLLM release notes 2026

Sibling page for the server-grade local inference engine. v0.18, v0.19, gRPC, async scheduler, CVE-2026-0994.

Read

Models

Local LLM releases, April 2026

The model side of the same month. Gemma 4, Qwen 3, Llama 4, Mistral Medium 3, with the macOS half spelled out.

Read

Architecture

Ollama local AI, the two layers Ollama does not ship

Architecture-first companion piece. Ollama is the model layer; perception and action are still your problem.

Read

Ollama's 2026 changelog, every version, plus the field that turns localhost:11434 into a Mac agent

What shipped in Ollama 2026, in one paragraph

Every shipped version, in order

Stop reading rows. Read the arc.

Theme 1: `ollama launch` becomes the platform

Theme 2: MLX is now first-class on Apple Silicon

Theme 3: Gemma 4 dominates the model timeline

Theme 4: Cloud and privacy dial in opposite directions

Theme 5: Tool-calling fidelity improves quietly

Every Ollama release notes page stops at ollama serve. This one keeps going.

The exact place in the Fazm source that names Ollama by name

From ollama serve to a Mac agent driving apps

The whole setup, in terminal form

Quickstart, four steps

From `ollama pull` to Fazm driving apps

The shims that work in front of Ollama

Why feed the model the AX tree, not a screenshot

Screenshot substrate vs. accessibility-tree substrate

Running Ollama and want it to drive your Mac apps?

Frequently asked questions

Keep reading

vLLM release notes 2026

Local LLM releases, April 2026

Ollama local AI, the two layers Ollama does not ship

Every Ollama release notes page stops at `ollama serve`. This one keeps going.