2026 field notes
The 2026 AI agent story the leaderboards keep missing
Search for AI tools, agents, Claude, ChatGPT or LLM news in 2026 and almost every result ranks the same handful of models and stops. That is the wrong frame for anyone who actually runs these things all day. The interesting change this year happened one layer up, at the agent loop, and the loudest unsolved problems are not about the model at all.
In 2026 the unit you run became the agent loop, Anthropic's Claude Code and OpenAI's Codex chief among them. The model turned into a swappable backend underneath the agent, connected through the open Agent Client Protocol (ACP). And the loudest open problems are now tool-UX problems, not model problems: agent sessions dying on restart, no clean way to fork a conversation, and context that auto-compacts and quietly drops earlier decisions.
The agents and protocols people argue about in 2026:
The model news is real, but it is not the news that affects your day
2026 has not been quiet on models. New frontier releases landed across Anthropic, OpenAI, Google and others, and the benchmark order reshuffled roughly every few weeks. If you want the dated version of that, I keep a separate 2026 model release timeline. This page is about the part those timelines skip.
Here is the thing the rankings never measure. The day you adopt an agent, the model is the part you touch least. You spend your hours inside the loop: the agent reading your files, running commands, proposing edits, asking for approval. That loop is the product now. The model is an input to it. And the input got commoditized faster than anyone expected, because of one boring protocol.
ACP quietly turned the model into a swappable backend
The Agent Client Protocol is an open, Apache-licensed standard that lets any editor or app talk to any coding agent over JSON-RPC via stdio. It is driven by Zed in collaboration with JetBrains. ACP-compatible agents already include Claude Code, Codex, OpenCode, Cursor, Gemini CLI and Copilot.
What that buys you in practice: the agent becomes a component you plug in, not a product you marry. The surface you run it in (a terminal, an editor, a native app) is one decision, and which agent backend answers the loop is a different, smaller decision you can change per task. The diagram below is the shape of a single turn once you stop thinking of the model as the thing you run.
One agent turn, with the backend as a swappable participant
The open problems in 2026 are tool problems, not model problems
Spend a week running any of these agents seriously and you hit the same wall, and it has nothing to do with the model's score on a benchmark. These are the gripes that show up in every thread and that no leaderboard captures:
What the agent rankings do not measure
- Your session survives a Mac restart instead of vanishing with the terminal
- You can fork a conversation in one click to try a branch without losing the original
- The context is not silently auto-compacted, dropping decisions you made an hour ago
- The same agent can reach past the terminal into your browser and native apps
- You can talk to it instead of typing every instruction
None of these are model capabilities. They are decisions the tool around the model made, or failed to make. Which is exactly why they are fixable without waiting for the next frontier release. I wrote separately about controlling context compaction and what it takes to run an AI agent on macOS if you want the specifics.
A concrete example: both backends, in one dependency list
I build Fazm, a native macOS app that wraps the real agent loop and fixes the tool-layer gripes above. The "swappable backend" claim is not marketing. It is a line you can read in the source. The bridge that runs the loop declares both agents as direct dependencies:
Claude Code (Anthropic) and Codex (OpenAI, the ChatGPT maker) both ship in the same binary, selectable per chat. When a model leaderboard flips next month, the response is a dropdown change, not a tool migration. That is the whole bet: the loop and your sessions outlast the model.
Where the model news still wins
To be fair to the rankings: the model is not irrelevant. For a genuinely hard reasoning task, the gap between a top-tier model and a mid one is real and you feel it. If your work lives at the frontier of what any model can do, the release news matters a lot, and you should read it.
But that describes a minority of day-to-day agent work. Most of it is plumbing: refactors, glue code, data wrangling, repetitive desktop tasks. For that volume, two adjacent frontier models are close enough that the deciding factor is whether the tool around them wastes your time. The honest split is: read the model news for the hard 10 percent, and pick your tool for the other 90.
So what is the actual 2026 takeaway
If you only track one thing this year, do not make it the leaderboard order. Track the agent layer. ACP made the model a backend. The agent loop is the product. And the work left to do is in the tool: keeping your sessions alive, letting you fork cleanly, refusing to compact your context away, and letting the same agent reach beyond the terminal into the browser and your native apps so it can control the desktop, not just the terminal.
That is the news that changes your day. The model that wins this month is the news that changes a chart.
Want the agent loop without the tool-layer gripes?
Book a short call and I will show you Claude Code and Codex running in one native Mac app, with sessions that survive a restart and one-click forking.
Common questions about the 2026 AI agent landscape
What actually changed in AI tools and agents in 2026?
The unit of work shifted from the chat box to the agent loop. In 2025 you mostly picked a model and pasted text. In 2026 you run an agent (Anthropic's Claude Code, OpenAI's Codex, and similar) that reads files, runs commands, and edits in place. The model underneath became a swappable backend, and the loudest unsolved complaints are now about the tool layer, not the model: sessions dying on restart, no clean way to fork a conversation, and context getting auto-compacted out from under you.
Is Claude Code better than ChatGPT's Codex in 2026?
It depends on the task, and that is the point. Codex (OpenAI's coding agent, from the maker of ChatGPT) leans cloud-first and async, which suits well-defined tasks you want to fan out in parallel. Claude Code leans interactive and deep, which suits messy codebases and architectural work. Because both speak the Agent Client Protocol, you no longer have to commit to one. You can run the same loop with either backend and switch per task.
What is the Agent Client Protocol (ACP)?
ACP is an open standard, Apache-licensed, that lets any editor or app talk to any coding agent over JSON-RPC via stdio. It is driven by Zed in collaboration with JetBrains, and ACP-compatible agents include Claude Code, Codex, OpenCode, Cursor, Gemini CLI, and Copilot. The practical effect: the agent becomes a pluggable component, so the model layer is where most of the cost and quality now lives, and the surface you run it in is a separate decision.
Can one tool run both Claude Code and Codex?
Yes. Fazm's bridge ships both @agentclientprotocol/claude-agent-acp version 0.29.2 and @zed-industries/codex-acp version 0.12.0 as backends you can swap per chat. It is the same agent loop you already use, wrapped in a native macOS UI, so picking a backend is a setting rather than a different app.
Why do AI coding agent sessions disappear when I restart?
Most agents store conversation state in a terminal process or a transient buffer. Quit the terminal, reboot the Mac, or close the window and the thread is gone or hard to recover. This is a tool decision, not a model limitation, which is why a wrapper that persists sessions to disk and auto-restores every window solves it without touching the model.
Do the model leaderboards matter for picking an agent?
Less than the roundups imply. Benchmark rankings flip every few weeks as new models ship, and if your tool is tied to one model you re-tool every time the order changes. If the agent loop and your sessions are decoupled from the model, a leaderboard flip is a backend swap, not a migration. The durable choice is the surface you run the loop in.
What does 'no auto-compacting' mean and why care?
Long agent sessions often summarize and drop older turns to fit a context window. That silently discards decisions you made earlier, and the agent later contradicts itself. Keeping the full chat history live for the lifetime of a window avoids that failure mode at the cost of a longer context. For multi-hour sessions where earlier decisions matter, that tradeoff is usually worth it.
Keep reading
AI model release news, 2026
The dated timeline of the year's frontier releases, and what you actually swap when the order flips.
Best computer-use agents for desktop control
Agents that reach past the terminal into the browser and native apps, compared.
Controlling Claude Code context compaction
Why long sessions silently drop decisions, and how to keep full history live.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.