Ollama's 2026 changelog, every version, plus the field that turns localhost:11434 into a Mac agent
Twenty-five-plus point releases between February 3 and May 5, 2026. The official release notes do a great job on ollama launch integrations, MLX speedups, and the full Gemma 4 family. Every other Ollama release-notes write-up stops there. This page keeps going: the Anthropic-shape shim that lets Ollama's OpenAI-compatible endpoint become the brain of a Mac agent that uses the macOS accessibility tree to drive any app you have open, not just the ones in the launch ecosystem.
DIRECT ANSWER · VERIFIED 2026-05-06
What shipped in Ollama 2026, in one paragraph
Between February 3 and May 5, 2026, Ollama shipped 0+ point releases across five minor versions (v0.15 → v0.23). The five themes: the ollama launch integration command (debut v0.16.0, Feb 12); the MLX runner for Apple Silicon (v0.17.5+, expanded in v0.19.0 and v0.21.0); the full Gemma 4 family of E2B / E4B / 26B / 31B (v0.20.0, April 2); Claude Desktop integration via ollama launch claude-desktop (v0.23.0, May 3); and Gemma 4 MTP speculative decoding on Apple Silicon for an over-2x speedup on the 31B coding model (v0.23.1, May 5). Source: github.com/ollama/ollama/releases.
THE COMPLETE 2026 RELEASE TABLE
Every shipped version, in order
Drawn from the official ollama/ollama GitHub releases page on May 6, 2026. Twenty-two rows, the headline change in each, and one note on why it matters for Mac users in particular.
| Version | Date | Headline | Why it matters |
|---|---|---|---|
| v0.23.1 | May 5, 2026 | Gemma 4 MTP speculative decoding on Apple Silicon | Over 2x speedup on the Gemma 4 31B coding model. MLX runner threading fixes, Go bumped to 1.26. |
| v0.23.0 | May 3, 2026 | Claude Desktop support via `ollama launch claude-desktop` | Claude Cowork and Claude Code integrations land in the launch ecosystem. Metal init hardened. |
| v0.22.1 | April 28, 2026 | Gemma 4 renderer thinking and tool-calling fixes | Model recommendations update independently. Desktop launch page aligns with CLI integrations. |
| v0.22.0 | April 28, 2026 | NVIDIA Nemotron 3 Omni and Poolside Laguna XS.2 | Two new model integrations, including Poolside's coding model. |
| v0.21.3 | April 24, 2026 | API accepts 'max' as a think value | OpenAI response mapping for reasoning effort aligned to the think parameter. |
| v0.21.2 | April 23, 2026 | OpenClaw onboarding reliability in `ollama launch` | Standardized model recommendation ordering. Web search plugin bundled in OpenClaw. |
| v0.21.1 | April 22, 2026 | Kimi CLI integration via `ollama launch kimi` | MLX runner gained logprobs, fused top-P/top-K sampling, better tokenization, thread safety. |
| v0.21.0 | April 16, 2026 | Hermes Agent integration via `ollama launch hermes` | Gemma 4 on MLX for Apple Silicon with mixed-precision quantization, expanded operators. |
| v0.20.7 | April 13, 2026 | Gemma quality fix when thinking disabled | ROCm bumped to 7.2.1 on Linux. |
| v0.20.6 | April 12, 2026 | Enhanced Gemma 4 tool calling | Improved parallel tool calling for streaming responses. |
| v0.20.5 | April 9, 2026 | OpenClaw channel setup for messaging platforms | Flash attention enabled for Gemma 4 on compatible GPUs. `/save` fix for safetensors imports. |
| v0.20.4 | April 7, 2026 | MLX M5 perf improvements using NAX | Gemma 4 flash-attention enablement. |
| v0.20.0 | April 2, 2026 | Full Gemma 4 family (E2B, E4B, 26B, 31B) | The marquee model release of the spring. Tool use, thinking traces, multimodal in. |
| v0.19.0 | March 27, 2026 | MLX framework integration for Apple Silicon | Web search plugin in `ollama launch pi`. Better KV cache hit rates. Qwen 3.5 tool-call parsing fixed. |
| v0.18.3 | March 25, 2026 | Visual Studio Code integration via GitHub Copilot | GLM tool-call parser improvements. OpenClaw gateway checks. |
| v0.18.0 | March 14, 2026 | Cloud-model perf, Kimi-K2.5 up to 2x, Nemotron-3-Super 122B | Ollama becomes an authentication provider in OpenClaw. |
| v0.17.5 | March 2, 2026 | Qwen 3.5 (0.8B to 35B) and MLX engine memory fixes | First Qwen 3.5 lineup support on Apple Silicon-tuned MLX. |
| v0.17.0 | February 21, 2026 | Tokenizer perf, VRAM-aware default context length | macOS and Windows apps default context length to available VRAM. |
| v0.16.3 | February 19, 2026 | Cline CLI integration via `ollama launch cline` | Model picker aligned with launch command. Extended MLX runner architecture support. |
| v0.16.2 | February 14, 2026 | OLLAMA_NO_CLOUD=1 privacy toggle | Disables cloud-model surface in the app. Image-gen timeout config. |
| v0.16.0 | February 12, 2026 | Introduced `ollama launch` command and Pi launcher | GLM-5 and MiniMax-M2.5 added. Ctrl+G text editor for prompts. |
| v0.15.5 | February 3, 2026 | Qwen3-Coder-Next, GLM-OCR, sub-agent planning | Dynamic context length defaults based on VRAM levels. Token prediction bug fixes. |
One row per release. The Ollama team also tags release candidates (e.g. v0.19.1rc0, v0.22.0-rc11) that are not on this list because their changes roll up into the matching stable.
THE FIVE THEMES OF THE YEAR
Stop reading rows. Read the arc.
The version table is good for lookup. The pattern across the year is what tells you where Ollama is going.
Theme 1: `ollama launch` becomes the platform
Debuted in v0.16.0 on February 12 as a way to run Pi. By May, the same verb runs Hermes, Kimi CLI, Cline, OpenClaw, GitHub Copilot CLI, opencode, and Claude Desktop. Roughly half of the 2026 releases carry an `ollama launch` line. Ollama is positioning itself as the runtime other tools attach to, not just a model server.
Theme 2: MLX is now first-class on Apple Silicon
v0.17.5 on March 2 fixed initial MLX engine memory issues. v0.19.0 on March 27 integrated the MLX framework. v0.21.0 on April 16 added Gemma 4 on MLX with mixed-precision quantization. v0.23.1 on May 5 added Gemma 4 MTP speculative decoding for over 2x speed on the 31B coding model. If you run Ollama on a Mac, the throughput story today is unrecognizable from January.
Theme 3: Gemma 4 dominates the model timeline
v0.20.0 on April 2 shipped the full family: E2B, E4B, 26B, 31B. v0.20.6 (April 12) and v0.22.1 (April 28) tuned tool calling and the renderer. v0.20.5 (April 9) and v0.21.0 (April 16) added flash attention paths. Five releases in six weeks all touched Gemma 4. It is the model the team is optimizing the runtime around.
Theme 4: Cloud and privacy dial in opposite directions
v0.16.2 on February 14 added OLLAMA_NO_CLOUD=1 to disable cloud-model surface in the app. v0.18.0 on March 14 made cloud models 2x faster for Kimi-K2.5. The runtime is supporting both ends: full local with no leakage, and managed cloud. Mac users who want to verify nothing phones home now have one env var.
Theme 5: Tool-calling fidelity improves quietly
Almost every release in 2026 carries a tool-calling fix: v0.17.5, v0.17.6, v0.18.3, v0.20.1, v0.20.6, v0.20.7, v0.21.3, v0.22.1. None individually is dramatic. Together they shift the floor: agent loops that previously demanded Claude Sonnet for tool reliability now work with Gemma 4 26B, Qwen 3.5, and Kimi-K2.5 served locally.
THE GAP IN EVERY OTHER WRITE-UP
Every Ollama release notes page stops at ollama serve. This one keeps going.
Once you have a model loaded and Ollama is listening on http://localhost:11434, you have an OpenAI-compatible chat endpoint. Most pages stop there because Ollama's job is done. The harder question is: what runs on your Mac, on top of that endpoint, that turns a chat completion into actual clicks and keystrokes inside Mail, Finder, Notes, Numbers, Slack, or any other native app you have open?
That layer has to read app state as structured data (not pixels), expose a tool-calling surface the model can drive, and let you swap the model endpoint without recompiling. Almost nothing in the consumer space does all three. Fazm is the one I know about. It is MIT-licensed, reads the macOS accessibility tree directly, and its model endpoint is one TextField in Settings that maps cleanly onto ANTHROPIC_BASE_URL.
The rest of this page is the half of the story Ollama's release notes do not tell.
“When the user has a Custom API Endpoint configured (LM Studio, Ollama, corporate proxy, etc.)...”
Desktop/Sources/Chat/ACPBridge.swift, fazm repo
THE ANCHOR FACT
The exact place in the Fazm source that names Ollama by name
Most Mac agents do not name the local-LLM runtimes they support in their public marketing because the support is fragile. Fazm has it in a code comment in the friendly-error path, which is a stronger signal than a marketing page: someone has shipped Ollama as a target and trapped its specific failure modes.
// When the user has a Custom API Endpoint configured (LM Studio, Ollama, corporate proxy, etc.),
// raw upstream errors like `API Error: 400 ... "No models loaded ... use the 'lms load' command"`
// are confusing - users blame Fazm for an error that's coming from their local server.
// Detect known custom-endpoint failures and surface an actionable message instead.
let endpoint = UserDefaults.standard.string(forKey: "customApiEndpoint") ?? ""
if !endpoint.isEmpty {
let lower = cleaned.lowercased()
if lower.contains("no models loaded") || lower.contains("lms load") {
return "Your custom API endpoint (\(endpoint)) reported no model is loaded. ..."
}
if lower.contains("api error") || lower.contains("connection refused") || lower.contains("econnrefused") {
return "\(cleaned)\n\nThis came from your custom API endpoint (\(endpoint))..."
}
}The actual env-var assignment that makes the bridge work is fourteen lines, eighty characters total, in the same file.
// Custom API endpoint (allows proxying through Copilot, corporate gateways, etc.)
if let customEndpoint = defaults.string(forKey: "customApiEndpoint"), !customEndpoint.isEmpty {
env["ANTHROPIC_BASE_URL"] = customEndpoint
}The Settings UI the user actually touches is in SettingsPage.swift around line 962. The placeholder text in the TextField is a real string from the binary.
@AppStorage("customApiEndpoint") private var customApiEndpoint: String = ""
Text("Custom API Endpoint")
TextField("http://localhost:4000", text: $customApiEndpoint)
.onSubmit {
Task { await chatProvider?.restartBridgeForEndpointChange() }
}
Text("Route API calls through a custom endpoint (e.g. local LLM bridge, corporate proxy, or GitHub Copilot bridge). Leave empty to use the default Anthropic API.")From ollama serve to a Mac agent driving apps
The hops are short. Ollama listens on localhost:11434, an Anthropic-shaped shim translates the API shape, Fazm reads the shim URL out of UserDefaults, the Node bridge gets it as an environment variable, and the model talks to your apps via AXUIElement traversal.
One turn: Fazm → shim → Ollama → shim → Fazm
The whole setup, in terminal form
No mockup. This is the real sequence on a Mac running Ollama v0.23.1 and Fazm built from main.
Quickstart, four steps
From `ollama pull` to Fazm driving apps
- 1
Pull a model in Ollama v0.23.1
ollama pull gemma4:31b. The 31B is the one that benefits from v0.23.1's MTP speculative decode on Apple Silicon.
- 2
Start an Anthropic-shaped shim
LiteLLM in proxy mode, claude-code-router, or a small FastAPI bridge. Point upstream at http://localhost:11434/v1.
- 3
Install Fazm, grant Accessibility
Download from fazm.ai. Grant the macOS Accessibility permission when macOS prompts. Fazm needs it to read AX trees.
- 4
Paste the shim URL in Custom API Endpoint
Settings → Advanced → AI Chat → Custom API Endpoint. Toggle on, paste, hit return. Bridge respawns automatically.
The shims that work in front of Ollama
Fazm speaks Anthropic Messages. Ollama's OpenAI-compatible surface speaks OpenAI chat completions. The shim is the middle. None of these ship with Fazm or with Ollama. They are independent projects, each one presenting an Anthropic-shaped endpoint and routing to whatever you put behind it, including Ollama on localhost:11434.
Test the shim with your real tasks, not a benchmark. Streaming, parallel tool calls, and large tool results are where shim quality diverges, and they are exactly what matters in an agent loop.
Why feed the model the AX tree, not a screenshot
Ollama's 2026 multimodal models can take screenshots. You should not, if you have a choice. The same UI state already exists as a typed tree of roles, titles, values, and positions via the macOS Accessibility API. Fazm reads that tree directly, which saves your local tokens for reasoning, not OCR.
Screenshot substrate vs. accessibility-tree substrate
A 2880x1800 screenshot becomes hundreds of kilobytes of base64. The model pays tokens to OCR 'Send' on a button. It also has to infer click coordinates in pixel space. A UI refresh, a font change, or a sidebar resize breaks everything.
- Lossy raster, hundreds of KB
- Model wastes context on OCR
- Pixel-space coordinate guessing
- Breaks on UI redesign or zoom
What this stack does and does not do today
- Driving Mail, Notes, Calendar, Numbers, Pages via the AX tree
- Driving any third-party Mac app that exposes accessibility
- Running fully offline once the Ollama model is pulled
- Swapping the model from Gemma 4 31B to Qwen 3.5 35B with no recompile
- Tool-call quality matching frontier closed models on long agent runs
- Driving apps that do not expose accessibility (rare on macOS, common on Electron with broken AX)
- Native /v1/messages support directly in Ollama (planned by community, not shipped)
“The Ollama release notes are the most useful changelog I read each month, and they still leave out the substrate question. Once you have a model serving on 11434, what's actually clicking buttons on your laptop? That part has been missing for two years.”
Running Ollama and want it to drive your Mac apps?
Walk through the shim choice, the Custom API Endpoint field, and which 2026 Ollama release fits your hardware. Fifteen minutes, no slides.
Frequently asked questions
What did Ollama actually ship in 2026 so far?
Between February 3 and May 5, 2026, Ollama shipped 25+ point releases across five minor versions. v0.15.5 (Feb 3) added Qwen3-Coder-Next, GLM-OCR, and sub-agent planning. v0.16.0 (Feb 12) introduced the `ollama launch` command and Pi launcher. v0.17.5 (March 2) added Qwen 3.5 in 0.8B-35B sizes and fixed MLX engine memory issues. v0.18.0 (March 14) shipped Nemotron-3-Super 122B and 2x faster Kimi-K2.5. v0.19.0 (March 27) integrated MLX framework for Apple Silicon and added a web search plugin. v0.20.0 (April 2) added the full Gemma 4 family (E2B, E4B, 26B, 31B). v0.21.0 (April 16) added Hermes Agent integration. v0.22.0 (April 28) added NVIDIA Nemotron 3 Omni and Poolside Laguna XS.2. v0.23.0 (May 3) added Claude Desktop support via `ollama launch claude-desktop`. v0.23.1 (May 5) added Gemma 4 MTP speculative decoding for a 2x speedup on the 31B coding model on Apple Silicon.
What is the biggest theme in the 2026 Ollama changelog?
The `ollama launch` ecosystem. It first appeared in v0.16.0 in February as a way to run Pi. By May, the same command launches Hermes Agent, Kimi CLI, Cline, OpenClaw, GitHub Copilot CLI, opencode, and Claude Desktop. Roughly half of the 2026 release notes carry an `ollama launch <something>` line. Ollama is positioning itself as the local-first runtime that other tools can attach to, not just a model server you `curl` against. That is a strategic shift from 2025, where `ollama serve` was the whole story.
Can I use Ollama as the model backend for a Mac agent like Fazm?
Yes, with one caveat. Fazm's chat engine speaks the Anthropic Messages API shape, and Ollama's compatible API is OpenAI-shaped at /v1/chat/completions on port 11434. You need an Anthropic-to-OpenAI shim between them. LiteLLM in Anthropic-proxy mode, claude-code-router, or a small custom FastAPI bridge all work. Point the shim at http://localhost:11434/v1, then paste the shim URL into Fazm's Custom API Endpoint setting. Fazm's ACPBridge.swift at lines 468 and 469 reads that value from UserDefaults and exports it as ANTHROPIC_BASE_URL on the Node subprocess it spawns. No recompile, no fork. The whole switch is one TextField in Settings.
Where does Fazm's source code mention Ollama by name?
ACPBridge.swift line 2045 has a literal comment that reads: 'When the user has a Custom API Endpoint configured (LM Studio, Ollama, corporate proxy, etc.), raw upstream errors like API Error: 400 ... No models loaded ... use the lms load command are confusing.' That comment introduces the friendly-error path at lines 2049 to 2058 that catches connection-refused and no-models-loaded errors from local servers and rewrites them into actionable messages. Ollama is named explicitly because the team has tested this end-to-end. The same file at lines 468 to 469 contains the actual env-var assignment that makes the bridge work.
Which 2026 Ollama release matters most for desktop agent workflows?
v0.23.1 on May 5, for one specific reason: Gemma 4 MTP speculative decoding on Apple Silicon, with Ollama's own release note quoting 'over a 2x speed increase for the Gemma 4 31B model on coding tasks.' Agent loops do a lot of structured-output and tool-call generation, which is exactly the workload that benefits most from speculative decoding. v0.21.0 in mid-April matters too, because that is where MLX got Gemma 4 support with mixed-precision quantization. If you are running Ollama on a 64GB M-series Mac and pointing Fazm at it, those two releases doubled your throughput.
Does this work with Ollama running fully offline?
Yes. Ollama's `ollama serve` listens on http://localhost:11434 by default and does not require network access for inference once the model is pulled. The Anthropic-to-OpenAI shim can also run on localhost (LiteLLM defaults to :4000). Fazm itself does not require network access for the bridge: paste http://localhost:4000 into Custom API Endpoint and the entire request path is a unix-domain hop on your laptop. The OLLAMA_NO_CLOUD=1 environment variable, added in v0.16.2 on February 14, 2026, also disables Ollama's cloud-model surface in the app, which matters if your security team flags any background calls to ollama.com.
Why does the official Ollama changelog skip the agent half?
Because Ollama is a runtime, and its audience is people running local models. The release notes are written for someone who has just typed `ollama pull <model>` and wants to know what changed. They cover: new models, new flags, performance, integrations launched via `ollama launch`. They do not cover the substrate question: once you have an OpenAI-compatible endpoint serving on localhost:11434, what software on your Mac actually turns that endpoint into clicks, keystrokes, and document edits inside Mail, Finder, Chrome, Notes, Numbers, or any other app you use? That is a consumer-side problem, and it is exactly what the Custom API Endpoint mechanism in Fazm exists to solve.
Can Ollama itself drive Mac apps via `ollama launch`?
No, and that is an important distinction. `ollama launch hermes`, `ollama launch claude-desktop`, `ollama launch opencode`, and similar commands launch other apps and configure them to use Ollama as their model backend. The launched app is responsible for any UI work it does. Ollama is a backend; the launched apps are clients. That arrangement covers chat-style and code-editor-style clients well, but it does not give you an agent that can drive arbitrary Mac apps you did not write yourself, because every launch target has to opt in. Fazm sits in the other place: it is itself a launched-by-Ollama-style client, and once attached, it drives every Mac app via the macOS accessibility tree, regardless of whether each app has integrated with Ollama.
What about CVEs or security issues in Ollama 2026?
The 2026 Ollama release stream has been release-note-clean on critical CVEs through v0.23.1 (May 5). The closest item is the OLLAMA_NO_CLOUD=1 setting added in v0.16.2 (Feb 14), which is a privacy-control toggle rather than a vulnerability fix. If you operate Ollama in a multi-user environment, the longstanding rule still applies: do not expose port 11434 directly to the internet, sit it behind a gateway that authenticates and rate-limits. The OpenAI-compat endpoint accepts arbitrary prompts and arbitrary model names with no auth in the default config.
What should I read alongside the Ollama 2026 release notes?
The github.com/ollama/ollama/releases page for the raw changelog and commit lists. The Ollama blog at ollama.com/blog for design rationale. Fazm's source tree at github.com/mediar-ai/fazm to see the consumer-side substrate that turns Ollama's endpoint into a Mac agent, in particular ACPBridge.swift around lines 468 and 2045. And LiteLLM's Anthropic-proxy docs if you want the cleanest shim option between Anthropic-shape clients and Ollama's OpenAI-shape endpoint.
Keep reading
vLLM release notes 2026
Sibling page for the server-grade local inference engine. v0.18, v0.19, gRPC, async scheduler, CVE-2026-0994.
Local LLM releases, April 2026
The model side of the same month. Gemma 4, Qwen 3, Llama 4, Mistral Medium 3, with the macOS half spelled out.
Ollama local AI, the two layers Ollama does not ship
Architecture-first companion piece. Ollama is the model layer; perception and action are still your problem.