Latest AI model releases, papers, and open-source projects (May 24 to 25, 2026)
A 48-hour window of three layers. The first is what every dated roundup captures (one paper, one tagged release). The second is a four-commit cleanup day that closes a thread from earlier in the week. The third is a silent Anthropic OAuth policy change at 5:50 PM Pacific on May 25 that has no public announcement at all and is visible only in client-side commit history. This is the on-disk record of all three.
May 24, 2026: no frontier-model announcements, no trending papers, only four agent-cleanup commits in the Fazm repo continuing the prior week's session-interrupt recovery thread. May 25, 2026: Microsoft Research published SkillOpt (arxiv.org/abs/2605.23904, code at github.com/microsoft/SkillOpt), claiming +23.5 / +24.8 / +19.1 point gains on GPT-5.5 in direct chat, Codex, and Claude Code respectively; the paper trended on Hugging Face with about 180 upvotes. Fazm shipped v2.9.37 with agent guardrails for system-altering shell commands and a third browser mode (No browser MCP, Assrt only). Same day at 17:50 PDT, Anthropic enforced a new OAuth policy that rejected any token-exchange request including the expires_in field with HTTP 400 ("Custom expires_in not allowed for scope user:sessions:claude_code user:mcp_servers"); Fazm's two-commit fix landed at 17:50:10 and 17:52:10 PDT but did not ship in a tagged release until v2.9.41 on May 27. Primary sources: huggingface.co/papers/date/2026-05-25, github.com/mediar-ai/fazm/commits/main.
Two days, three signal layers
The May 22 to 23 window I wrote up earlier this week illustrated one ecosystem gap: tagged releases lag the commit log by one to three weeks, so a quiet announcement day can hide a productive code day. The May 24 to 25 window illustrates a different gap, sharper and more useful. There is a layer beneath both releases and commits: server-side platform behavior at the major labs. It changes occasionally, with no announcement, and the only artifact is the cluster of client-side fixes that land within an hour of each other across multiple repos.
On May 24, a quiet Sunday, the only visible work was four small commits on one repo finishing the prior week's interrupt-recovery thread. On May 25, three things happened in order: Microsoft Research dropped a paper that any roundup will cover (SkillOpt), Fazm tagged v2.9.37 with a real guardrail story plus a third browser mode, and at 5:50 PM Pacific Anthropic silently broke every Claude Code wrapper that had been passing a one-year expires_in to its OAuth token-exchange endpoint. The first two are public and indexed. The third is the kind of event that ends up in client commit histories and nowhere else.
May 24: four commits, one carryover thread
If you stopped at the announcement layer, May 24, 2026 was a non-event. No frontier-model drop, no headline paper, no security disclosure, no major open-source release. The Hugging Face dated-papers feed for the date is dominated by carryover from earlier in the week.
One open-source agent project landed exactly four commits, all small, all finishing a thread that started in the May 23 burst:
1c88b298— Increase max recent workspaces limit to 9. A UI ceiling that affected nobody until session restore worked reliably enough to make nine workspaces a realistic state.46bc074b— Support both partial and empty interrupt markers in recovery. The prior-context window from May 23 only handled the "cleanly interrupted with a marker" case; this extends it to streams that died mid-write or never wrote a marker at all.c8371846— Record interrupted tools as cancelled in recent tool summary. The recent-tool list now distinguishes "ran to completion" from "cancelled by interrupt" instead of presenting both as completed.d6743ff6— Clean up tool history and interrupted sessions on session close. When a window is closed mid-stream, the tool history is now drained so the next open does not show ghost in-flight calls.
None of this is announcement-grade. All four commits are bug-class polish on a feature that landed in the prior 48 hours and a half. The reason it is worth a sentence in the May 24 to 25 record is that this is what a Sunday looks like at the agent layer the week before a release, and it is the exact pattern you want to recognize when you are reading the ecosystem: a quiet day after a busy day is often closure on the busy-day thread, not actually nothing.
May 25 morning: SkillOpt is the paper that matters
The single trending paper of the window is SkillOpt: Executive Strategy for Self-Evolving Agent Skills, from Microsoft Research. The arxiv v1 went up on May 22 and the v2 revision on May 25, and it is the v2 that picks up attention on the Hugging Face papers feed (about 180 upvotes by end of day). The repo at github.com/microsoft/SkillOpt is open, MIT-licensed, and includes the executable optimizer plus the validation harness.
The central claim is that a compact natural-language skill document should be treated as the trainable state of an otherwise frozen language agent. A separate optimizer model proposes bounded add, delete, or replace edits on a single skill.md, and an edit is accepted only when it strictly improves a held-out validation score. The result is a deployable best_skill.md artifact that adds zero inference-time model calls. The headline numbers on GPT-5.5 over no-skill baselines:
The +19.1 inside Claude Code is the most useful number on the chart. Direct-chat gains can be explained as filling obvious gaps a chat model has access to but never deploys without prompting; an agentic-loop gain isolates what a trained skill artifact buys inside a harness that already does its own tool-use scaffolding. The artifact also transfers across model scales, between Codex and Claude Code harnesses, and to a nearby math benchmark without re-optimization. For anyone shipping Claude Code (raw or wrapped) into production, this is a real optimization surface, not prompt-engineering theater dressed up.
May 25 early afternoon: Fazm v2.9.37 ships
The tagged release of the window is v2.9.37, dated 2026-05-25 in CHANGELOG.json. Three things are worth pulling out.
- Agent guardrails for system-altering commands. The agent now warns before running shell commands that change system state (disabling Wi-Fi or network interfaces, modifying firewall rules, system shutdown, deleting outside the workspace). The motivation is that an autonomous coding agent with shell access will eventually be asked, by accident or pressure, to do something it should refuse. The version of "refuse then cave under pushback" that any general-purpose chat model defaults to is the wrong failure mode in an agent harness.
- Consistent handling of user-provided passwords. The agent now treats a user-typed password the same way every time instead of refusing then accepting after a follow-up message. This is the corollary to the guardrail: refusing a legitimate sudo request because the password looks sensitive is just as bad as caving on a destructive command.
- No-browser-MCP (Assrt only) mode. A third Settings option under Browser Automation: disable Playwright and the bundled browser-harness MCPs entirely, leave only the Assrt MCP active. The use case is users whose primary Chrome session keeps contending with Playwright's lock file. Removing the implicit browser MCPs eliminates the contention class without disabling AI-driven browser work entirely, because Assrt runs its own browser surface.
v2.9.37 ships, and from the changelog alone it looks like the day's release work is done. It is not.
May 25 at 5:50 PM Pacific: the OAuth break nobody announced
The unannounced event of the window arrives at 17:50:10 PDT on May 25, in a single commit on the Fazm repo modifying nine lines of acp-bridge/src/oauth-flow.ts. The commit message is Remove expires_in from OAuth token exchange to prevent HTTP 400. The next commit at 17:52:10 PDT wires the change into the user-facing Connect-Claude flow. The first commit's comment block is what tells you what really happened on the server side:
acp-bridge/src/oauth-flow.ts before and after May 25, 2026
// Token-exchange request body sent to Anthropic. // Sets a one-year token lifetime explicitly. const TOKEN_EXPIRY_SECONDS = 31536000; // 1 year const body = { grant_type: "authorization_code", code, redirect_uri: redirectUri, client_id: CLIENT_ID, code_verifier: codeVerifier, state, expires_in: TOKEN_EXPIRY_SECONDS, };
- expires_in: 31536000 — what every Claude Code wrapper had been sending
- HTTP 400 from /v1/oauth/token: "Custom expires_in not allowed for scope user:sessions:claude_code user:mcp_servers"
- Connect-Claude flow fails with no public Anthropic announcement of the change
The Anthropic side of this change has no news post, no API changelog entry, no documentation update visible from the public docs site as of two days after the event. The only place the change exists, in writing, is in the inline comment Fazm's commit added to oauth-flow.ts and in the HTTP 400 response body that production endpoint returns to any client passing the now-rejected field. Search the Anthropic news page for the May 24 to 25 window and there is nothing. Search public Claude Code wrapper repos for OAuth-related commits dated May 25, and the pattern of same-day fixes lights up.
“Two commits, two minutes apart, on a Sunday evening, fixing a server-side OAuth policy change that has no public announcement. This is the third signal layer aggregators do not have.”
Fazm git log, 2026-05-25 17:50 to 17:52 PDT
The May 25 timeline, in order
The full Monday cascade is worth seeing in sequence. The morning is public and indexable. The afternoon shifts into the kind of work that only exists in commit history. The day ends with the OAuth fix still un-tagged.
2026-05-25, public and unreleased work in order
Morning — SkillOpt v2 lands on arxiv
Microsoft Research uploads the v2 revision of SkillOpt (arxiv 2605.23904). The Hugging Face papers page picks it up and starts accumulating upvotes (~180 by end of day). The headline number that propagates first is the direct-chat +23.5 on GPT-5.5; the more interesting +19.1 inside Claude Code shows up further down in the abstract.
Early afternoon — Fazm v2.9.37 ships
The agent-guardrails work plus the No-browser-MCP (Assrt only) settings option ship as a tagged release. Same release: Assrt's Gemini default switches to Flash to match the Haiku-tier latency profile, and the Assrt-with-Gemini 401 bug is fixed.
17:50:10 PDT — first OAuth-fix commit
Fazm commit 6b8e7db9 removes expires_in from the OAuth token-exchange body in acp-bridge/src/oauth-flow.ts. The TOKEN_EXPIRY_SECONDS constant (previously 31536000, one year) is removed and a comment is added: Anthropic now controls token lifetime server-side; passing a custom expires_in causes HTTP 400.
17:52:10 PDT — connection-flow fix
Fazm commit a03927f3 wires the OAuth fix into the user-facing Connect-Claude path so a fresh login no longer returns 'Custom expires_in not allowed for scope user:sessions:claude_code user:mcp_servers'. From this point forward, new OAuth connections succeed against the production endpoint.
17:57 to 17:58 PDT — analytics burst
A five-commit burst lands instrumenting chat-message and partial-response analytics so any future OAuth-shaped failure surfaces the actual user prompt and any partial AI response in chat_agent_query_failed. The shape of this burst is itself a signal: an analytics push immediately after a silent platform-side break is what teams do when they want to catch the next one before users do.
Not yet released
Both OAuth-fix commits stay un-tagged through May 25 and May 26. They reach end users only with v2.9.41 on May 27. Any aggregator that read only tagged releases for the May 24 to 25 window saw v2.9.37 and nothing else, missed the policy change entirely, and would not know about the fix for two more days.
Two signals every May 24 to 25 roundup missed
The dated roundups I checked for this window cover SkillOpt, the Cursor Composer 2.5 first-week promo expiry, the Codex Pro 2x extension, and a few minor model-pricing updates. None of them mention:
- The Anthropic OAuth policy enforcement at 17:50 PDT on May 25. The change rejected token-exchange requests including the
expires_infield for theuser:sessions:claude_codeanduser:mcp_serversscopes with HTTP 400 ("Custom expires_in not allowed for scope ..."). Any Claude Code client passing a one-year token lifetime broke at the moment of next login. The fix is trivial (remove the field), but you cannot remove a field you do not know exists, and the only artifact telling you it exists is the 400 response body. - The cleanup-day pattern on May 24. Four commits closing the May 23 interrupt-recovery thread. A roundup that indexes tagged releases shows zero releases for May 24; a roundup that indexes commit subjects ranged by date shows the actual end of that week's interrupt-recovery work.
Both signals are accessible from a single workstation in about three minutes. The OAuth signal requires reading the commit log of any one Claude Code client filtered to the date and grepping for "OAuth" or "auth". The cleanup-day signal requires reading the same commit log for the previous day. Neither is hidden; both are absent from every dated AI roundup I could find for this window.
The honest counterargument
The objection to chasing "silent server-side policy changes" is that most of them do not matter, and the discipline of reading every client repo's commit log for auth-shaped fixes is more cost than signal. That is fair for almost every other 48-hour window in 2026. Most OAuth, auth, and token-related commits are routine maintenance with no platform-side trigger. The reason May 25 is worth singling out is not that OAuth fixes happened, it is that two minutes separated the two commits and the response body contained an enforcement reason that did not exist the prior week. Both of those are detectable signals that this OAuth fix is the consequence of a server-side change, not a client-side rewrite.
The refinement is to spot-check, not to subscribe. Read the dated commit log of one Claude Code client when an aggregator-checked day looks too quiet. If you see an auth-shaped fix clustered tightly in time on that day, it is worth one minute of follow-up: pull the diff, read the inline comment, and check whether the change explains itself as a response to a platform-side enforcement (it usually does, and the comment is usually explicit). If yes, you have a signal nobody else has. If no, you wasted a minute. The signal-to-noise on this discipline is high in practice because client engineers tend to document the why of these fixes in the diff itself, even when their own changelog skips it.
What to predict from this window
Three short predictions, all falsifiable inside two weeks.
- Anthropic will document the new OAuth-lifetime policy in some form (an API changelog entry, a deprecation note, an authentication-docs update) within the next 7 to 14 days. The HTTP 400 response body is already documenting it inadvertently; once the silent-enforcement pattern reaches a critical mass of broken clients, the public docs catch up.
- SkillOpt or an equivalent skill-document optimizer becomes a default surface inside at least one Claude Code wrapper in the next two months. The +19.1 inside Claude Code on GPT-5.5 is too large a gain for the agent ecosystem to ignore, and the artifact-transfer property (one optimized skill.md, no re-optimization, works across harnesses) makes it cheap to integrate.
- The May 25 analytics burst at Fazm (five commits adding message-text and partial-response tracking to chat_agent_query_failed) will translate into a settings option or support-export tool inside two minor versions. That kind of instrumentation push only ships if the team plans to read the resulting analytics, and reading analytics that include user prompts almost always requires a user-facing export or opt-out surface.
Want a walkthrough of the OAuth-break method on your own Claude Code stack?
If you are running or evaluating a Claude Code wrapper and want a 15-minute pass at reading platform-side policy changes through client commit logs, book a call. We will open Fazm together, walk the May 25 OAuth fix end to end, and show you the same discipline applied to your stack.
Frequently asked questions
What AI model releases, papers, and open-source projects shipped on May 24 to 25, 2026?
On May 24, no major lab made an announcement and no significant paper trended. The only verifiable shipped work was four commits on the Fazm repo continuing the session-interrupt recovery thread from May 23: a workspaces-recents bump to 9, partial-and-empty interrupt-marker support in recovery, recording interrupted tools as cancelled in the recent-tool summary, and tool-history cleanup on session close. On May 25, Microsoft Research released SkillOpt (arxiv 2605.23904), a text-space optimizer that treats a single natural-language skill document as the trainable state of a frozen agent; the paper claims +23.5 points in direct chat, +24.8 inside the Codex agentic loop, and +19.1 inside Claude Code on GPT-5.5 over no-skill baselines, and trended on Hugging Face with about 180 upvotes. Fazm tagged v2.9.37 with agent guardrails for system-altering shell commands and a No-browser-MCP (Assrt only) mode. At 5:50 PM Pacific, Anthropic silently enforced a new OAuth policy that rejected any token-exchange request including the expires_in field with HTTP 400 ("Custom expires_in not allowed for scope"), and Fazm's two-commit fix landed at 17:50:10 and 17:52:10 the same day, but did not reach a tagged release until v2.9.41 on May 27.
Why does the Anthropic OAuth policy change on May 25 matter if there was no announcement?
Because it is a textbook example of a signal aggregators never carry. The change was server-side, undocumented at the moment of enforcement, and visible only to clients that exchanged an OAuth code with a custom expires_in field in their token-exchange body (most Claude Code clients did, because the long-lived token shape was the documented norm). The way you discover it is not by reading a release feed, you discover it by getting an HTTP 400 with "Custom expires_in not allowed for scope user:sessions:claude_code user:mcp_servers" in the body when a previously working OAuth flow breaks. Fazm's two-commit fix at 17:50 and 17:52 Pacific on May 25 is the on-disk record of that discovery: one commit removes the offending field, the second wires the fix into the connection flow. Search any release-roundup feed for the May 24 to 25 window and the policy change does not appear. Search every Claude Code wrapper repo for OAuth-related commits dated May 25 and the pattern shows up across multiple projects.
What is SkillOpt and why is the +19.1 inside Claude Code number the most interesting one?
SkillOpt is a Microsoft Research project that treats a compact natural-language skill document as the trainable state of a frozen language agent. A separate optimizer model turns scored rollouts into bounded add, delete, or replace edits on a single skill document, and an edit is accepted only when it strictly improves a held-out validation score. The result is a deployable best_skill.md artifact that adds zero inference-time model calls at runtime. The headline numbers on GPT-5.5 are +23.5 points in direct chat, +24.8 inside the Codex agentic loop, and +19.1 inside Claude Code. The Claude Code number is the interesting one because it isolates what an optimized skill artifact buys inside a harness that already does its own tool-use scaffolding. Direct-chat gains can be explained as filling in obvious gaps; an agentic-loop gain like +19.1 is harder to explain away. The artifact transfers across model scales, between Codex and Claude Code harnesses, and to a nearby math benchmark without re-optimization. The implication for anyone running Claude Code in production is direct: skill-file engineering is a real optimization surface, not just prompt-engineering theater.
What did Fazm v2.9.37 add on May 25, and what is the No-browser-MCP (Assrt only) mode?
v2.9.37 added a safety guardrail so the agent warns before running shell commands that change system state (disabling Wi-Fi or network interfaces, modifying firewall rules, system shutdown, rm -rf on directories outside the workspace) and made the agent handle user-provided passwords consistently rather than refusing then caving under pushback. It also fixed Assrt QA tests failing with HTTP 401 when Gemini was the selected model, switched the Assrt default to Gemini Flash to match the cost and latency profile of the Anthropic Haiku default, and added a third browser mode in Settings: No browser MCP (Assrt only), which disables both Playwright and the bundled browser-harness MCPs and leaves only the Assrt MCP active. The new mode exists because Playwright's recurring lock-file contention with the user's primary Chrome session was the most common support thread that week; removing the implicit browser MCP and falling back to Assrt-only browser tools eliminates that contention class without disabling AI-driven browser work entirely.
Where do I check primary sources for the May 24 to 25, 2026 events?
SkillOpt's arxiv preprint is at arxiv.org/abs/2605.23904 and the open-source repo at github.com/microsoft/SkillOpt. Hugging Face's dated paper index for May 25 is at huggingface.co/papers/date/2026-05-25. Fazm's tagged release for May 25 (v2.9.37) is in CHANGELOG.json at the root of github.com/mediar-ai/fazm. Fazm's same-day OAuth fix is two commits on github.com/mediar-ai/fazm/commits/main, both timestamped 17:50:10 and 17:52:10 PDT on 2026-05-25, modifying acp-bridge/src/oauth-flow.ts. The Anthropic OAuth policy change itself has no public announcement document, the authoritative artifact is the HTTP 400 response body returned by the Anthropic console token endpoint (/v1/oauth/token) to any client passing expires_in in the token-exchange request on or after that date.
Is there a single recommended cadence to track AI in a 48-hour window like May 24 to 25?
Three passes, in order of decreasing aggregator coverage. First pass, the dated trending feed (huggingface.co/papers/date/2026-05-25 for the day in question) plus the major lab news pages. This catches SkillOpt and the v2.9.37 tag in roughly 90 seconds. Second pass, one or two project changelogs you actually run, read by date range. This catches the kind of work that landed in commits but is not tagged yet (in this window, the analytics-instrumentation push that lands in 9 commits between 17:00 and 18:00 Pacific). Third pass, the most-overlooked source: a date-ranged grep across multiple client repos for OAuth, auth, or signature-related fixes that all landed within the same 1-2 hour window. When more than one independently maintained client ships an OAuth fix in the same afternoon, you are looking at a silent server-side policy change you would otherwise never see.
What was the actual OAuth diff that fixed the May 25 break?
Two commits, both on May 25 at 17:50:10 and 17:52:10 PDT, both touching acp-bridge/src/oauth-flow.ts in the Fazm repo. The first removes the TOKEN_EXPIRY_SECONDS constant (previously 31536000, a one-year lifetime) and removes the expires_in field from the token-exchange request body sent to Anthropic's console token endpoint (/v1/oauth/token). The second wires that change into the connection-flow caller and adds an inline comment documenting why the field cannot be set: Anthropic now controls token lifetime server-side for the user:sessions:claude_code and user:mcp_servers scopes, and passing a custom expires_in returns HTTP 400 with "Custom expires_in not allowed for scope ..." in the body. The net change is 9 lines added and 2 removed across one file. The behavior change is that the server now decides the token lifetime and the client honors data.expires_in from the response.
Does Fazm itself help me check 'what shipped in the past 48 hours' from inside the app?
Yes, through the deep-research skill that auto-installs to ~/.claude/skills/deep-research/SKILL.md on first launch. The skill runs an 8-phase pipeline (Scope, Plan, Retrieve, Triangulate, Synthesize, Critique, Refine, Package), launches parallel web searches plus parallel research subagents, verifies citations where possible, and writes a markdown plus HTML plus PDF report into a dated folder under ~/Documents. Because it runs as the local Claude Code, Codex, or Gemini agent on your machine, the answer is grounded against fresh searches from your IP, not a cached newsletter summary. For the May 24 to 25 window, asking the skill for a dated rundown gives a multi-source synthesis you can cross-check against arxiv, the Fazm CHANGELOG, and (where it exists) the lab news pages.
Related on this site
Latest AI model releases, papers, and open-source projects (May 22 to 23, 2026)
The adjacent 48-hour window, read through the commit-log layer instead of the OAuth-break layer. Twenty-three Fazm commits on May 23 previewed the Composio MCP work that has not shipped a tag yet.
Latest AI model releases, papers, and open-source projects (May 21 to 22, 2026)
Two days earlier: the Gemini-as-ACP-backend window. Five tagged Fazm releases (v2.9.32 through v2.9.36) plus a structural shift that made Gemini a free fallback when built-in credits run out.
Anthropic Claude Code 529 outage (May 2026)
An earlier example of how silent server-side behavior at a major lab propagates as immediate-fix commits in client projects. The 529-overloaded pattern broke fallback logic in naive Claude Code clients.
Comments (••)
Leave a comment to see what others are saying.Public and anonymous. No signup.