MAY 2026 / WHAT SHIPPED, AND WHAT IT TAKES TO RUN IT

Hugging Face new models, May 2026. Here is the verified list, and the part every roundup skips.

Most pages answering this question reprint a list of names and stop. The list is the easy half. The hard half, the one a person searching this actually runs into next, is that a new model on Hugging Face is a file, not a working agent. This guide gives the dated, checkable list of what landed in May 2026, then walks the exact gap between a downloaded weight and an agent driving your Mac.

M
Matthew Diakonov
9 min read

Direct answer, verified 2026-05-31

Hugging Face does not publish a single dated announcements page. The authoritative record of what gained support in May 2026 is the Transformers release notes. Across the month, three releases shipped: v5.8.0 (May 5) added DeepSeek-V4, Gemma 4 Assistant, Granite Speech Plus, Granite 4 Vision, EXAONE 4.5, and PP-FormulaNet; v5.8.1 (May 13) was a patch fixing the DeepSeek V4 integration; and v5.9.0 (May 20) added Cohere2Moe, Parakeet TDT, and HRM-Text.

That is the lookup. The rest of this page is the part you need right after: which of these you can actually point at an agent, and how.

The full list, with dates and what each one is

Every row below comes from a tagged Transformers release. The date is the tag date, the model is the class added, and the description is condensed from the release notes and the model card. A patch release is included because the absence of new models in v5.8.1 is itself a signal: DeepSeek V4 was important enough that it got a dedicated fix-only release a week after it landed.

ReleaseDateModelWhat it is
v5.8.0May 5, 2026DeepSeek-V4Next-generation Mixture-of-Experts language model with a hybrid local plus long-range attention design. Ships as -Flash, -Pro, and -Base variants that share the architecture but differ in width, depth, and expert count.
v5.8.0May 5, 2026Gemma 4 AssistantA small, text-only companion model that enables speculative decoding for Gemma 4 using Multi-Token Prediction. It reuses the target model's KV cache and skips the pre-fill phase entirely.
v5.8.0May 5, 2026Granite Speech PlusMultimodal speech-to-text model with an enhanced projector that consumes concatenated encoder hidden states. Transcribes audio with speaker annotation and word-level timestamps.
v5.8.0May 5, 2026Granite 4 VisionVision-language model built for enterprise document data extraction, with chart and table understanding.
v5.8.0May 5, 2026EXAONE 4.5LG AI Research's first open-weight vision-language model. 33 billion parameters total, including a 1.2B vision encoder, strong on document understanding and Korean contextual reasoning.
v5.8.0May 5, 2026PP-FormulaNetLightweight model for table structure recognition in documents and natural scenes.
v5.8.1May 13, 2026DeepSeek-V4 (patch)Patch release. No new models; it fixed the DeepSeek V4 integration shipped a week earlier.
v5.9.0May 20, 2026Cohere2MoeMixture-of-Experts language model with a hybrid attention pattern combining sliding-window and full-attention layers.
v5.9.0May 20, 2026Parakeet TDTSpeech-to-text model added to the audio model family.
v5.9.0May 20, 2026HRM-TextHierarchical Reasoning Model variant using a hierarchical recurrent forward pass across two transformer stacks.

Two more things changed at the platform level that month, both worth knowing if you move weights around. Hugging Face added a Copy to Bucket button on repository pages, backed by Xet server-side transfers, so large files copy from the Hub to a bucket without a round trip through your machine. And the dataset benchmark leaderboards gained a model-size filter, so you can sort by parameter count and see the top performers per size band rather than one giant mixed ranking.

Confirm the release yourself

You do not have to trust the table. The model classes are in the installed library the moment you upgrade. If DeepseekV4ForCausalLM imports, the support is real and dated to the tag you pulled.

verify-transformers.sh

The part the roundups skip: a weight is not an agent

Here is where almost every list of new models leaves you. It tells you EXAONE 4.5 has 33 billion parameters and stops. But the reason most people search for new model announcements is to use one, and a downloaded weight does nothing on its own. It is parameters plus a config. Between that file and an agent that clicks around your Mac sit two pieces of software that no roundup mentions.

What you actually have, versus what you need

A repository of weights, a tokenizer, and a config. It can be loaded by Transformers. It cannot answer a request, call a tool, or take an action until something serves it and something else drives it.

  • Parameters and config files
  • No HTTP endpoint
  • No tool-calling loop
  • No connection to your apps

The first piece is a runtime. LM Studio, vLLM, llama.cpp's server, or a hosted inference endpoint takes the weights and answers HTTP requests. The second piece is the agent client: the loop that sends a prompt, reads the tool calls the model emits, runs those tools, feeds the results back, and repeats until the task is done. The model announcement gives you neither. It gives you the thing in the middle.

Where Fazm fits: the Custom API Endpoint

Fazm is an open-source macOS app that I work on. It wraps Claude Code and Codex in a native interface, which means it is the agent-client half of the picture above: the loop that runs tools, holds full context without auto-compacting, and reaches past the terminal into the browser and native Mac apps through accessibility APIs. By default it talks to Claude. But it does not have to.

There is a setting called Custom API Endpoint. It tells the agent loop to send its requests somewhere other than the built-in Claude connection. The constraint, and this is the anchor of the whole thing, is spelled out in Fazm's own changelog. Version 2.9.25, shipped May 19, 2026, reads: “Clarified Custom API Endpoint setting requires an Anthropic-API-compatible endpoint.” The earlier 2.7.1 release on May 1 named LM Studio as an example endpoint and added a friendlier error when that endpoint returns no models loaded or refuses the connection.

That single requirement is what determines whether a May 2026 model is reachable. If your runtime already speaks the Anthropic messages format, you point Fazm at it and the new model drives the same persistent, forkable, voice-capable UI that Claude does. If your runtime speaks the OpenAI format instead, which most local servers do, you put a small translation shim in front of it that exposes the Anthropic shape. The model does not change. Only the endpoint in front of it does.

One more changelog line matters for honesty about cost. Version 2.9.55, shipped May 29, 2026, states: “Stopped Custom API Endpoint routing from using Fazm's built-in Anthropic key or built-in credits.” When you route the agent at your own model, the requests hit your endpoint and your billing, not Fazm's bundled Claude key. That is the correct behavior for running an open weight you host yourself, and it is the difference between a wrapper that quietly bills you and one that gets out of the way.

Which of May's models are realistic agent drivers

Plumbing one in is not the same as it being good at the job. An agent loop is a brutal workload: long chains of tool calls, structured arguments the model has to format exactly right, and a context window that fills with tool output. Most of May's additions are not built for that, and that is fine, because they were never meant to be.

Granite 4 Vision is a document-extraction model. Granite Speech Plus and Parakeet TDT are speech-to-text. PP-FormulaNet recognizes table structure. EXAONE 4.5 is a vision-language model strong on document understanding. Gemma 4 Assistant is a speculative-decoding helper, not a standalone chat model. None of those is an agent driver, and asking one to run a multi-step task on your desktop is the wrong test for it.

The plausible candidates are the general-purpose Mixture-of-Experts releases: DeepSeek-V4 and Cohere2Moe. Even there, the deciding factor is not a benchmark number. It is tool-calling reliability and usable context length under real load. The only honest way to find out is to wire one in through a Custom API Endpoint and watch it attempt a genuine multi-step task before you trust it with anything that matters. The metric below is the blunt version of that point.

0new models added across the May release cycle
0Transformers releases tagged in May 2026
0Bparams in EXAONE 4.5, the largest new VLM
0of them runs Claude Code out of the box (zero)

The last figure is not a knock on the models. It is a reminder that a model announcement and an agent are different objects. Closing that gap is the work the headline list never shows you.

Want to point a new open model at your Mac?

Book a call and I will walk you through wiring a local or hosted model into Fazm through a Custom API Endpoint, and where the rough edges actually are.

Questions about Hugging Face's May 2026 models

Frequently asked questions

What new models did Hugging Face announce in May 2026?

Hugging Face does not publish a single dated announcements page, so the canonical record is the Transformers release notes. Across May 2026 three releases landed. v5.8.0 (May 5) added first-class support for DeepSeek-V4 (in -Flash, -Pro, and -Base variants), Gemma 4 Assistant, Granite Speech Plus, Granite 4 Vision, EXAONE 4.5, and PP-FormulaNet. v5.8.1 (May 13) was a patch that fixed the DeepSeek V4 integration. v5.9.0 (May 20) added Cohere2Moe, Parakeet TDT, and HRM-Text. That is nine new models added across the month, plus one fix-only release.

Is there an official Hugging Face page that lists new model announcements by month?

No. There is no calendar-indexed announcements feed. The places people treat as 'announcements' are three live, popularity-ranked feeds (huggingface.co/models sorted by trending, huggingface.co/papers/trending, and the model size leaderboards) plus per-lab blog posts. The closest thing to a dated, authoritative list of what gained support is the Transformers Releases page on GitHub, because each tag names the exact models added and is tied to a real timestamp.

What is EXAONE 4.5 and who made it?

EXAONE 4.5 is LG AI Research's first open-weight vision-language model, added in Transformers v5.8.0. It integrates a dedicated visual encoder into the existing EXAONE 4.0 framework. It has 33 billion parameters total, including roughly 1.2 billion in the vision encoder, and is noted for document understanding and Korean contextual reasoning.

Does DeepSeek-V4 come in more than one size?

Yes. The Transformers v5.8.0 implementation covers DeepSeek-V4-Flash and DeepSeek-V4-Pro, plus their -Base pretrained variants. They share the same Mixture-of-Experts architecture with a hybrid local and long-range attention design, but differ in width, depth, expert count, and weights. v5.8.1 followed a week later specifically to fix issues in that integration.

Can I use one of these new Hugging Face models as a coding agent on my Mac?

Not directly from the weight file. A model on Hugging Face is parameters plus a config. To drive an agent you need two more things: a runtime that serves the model behind an HTTP API (LM Studio, vLLM, llama.cpp server, or a hosted inference endpoint), and an agent client that can be pointed at that API and knows how to run tools. Fazm is the agent-client half. Its Custom API Endpoint setting lets you route the agent loop at any Anthropic-API-compatible endpoint instead of the built-in Claude connection.

What kind of endpoint does Fazm's Custom API Endpoint setting expect?

An Anthropic-API-compatible one. Fazm's changelog clarified this explicitly in version 2.9.25 on May 19, 2026: 'Clarified Custom API Endpoint setting requires an Anthropic-API-compatible endpoint.' The earlier 2.7.1 release on May 1 referenced LM Studio as an example endpoint and added a friendlier error when the endpoint returns no models loaded or refuses the connection. So an OpenAI-style server needs a shim that exposes the Anthropic messages format; a server that already speaks Anthropic's API can be used as-is.

If I point Fazm at my own local model, does it still use Fazm's Claude credits?

No. Fazm version 2.9.55, shipped May 29, 2026, changed this on purpose: 'Stopped Custom API Endpoint routing from using Fazm's built-in Anthropic key or built-in credits.' When you route through your own endpoint, the requests go to your endpoint and your billing, not Fazm's bundled Claude key. That is the correct behavior for running an open model you host yourself.

Will any of these new models actually work well as an agent driver?

That is a separate question from whether the plumbing connects. An agent loop hammers a model with long tool-use chains, structured tool calls, and large context. A multimodal document model like Granite 4 Vision or a speech model like Parakeet TDT is not built for that. The general-purpose Mixture-of-Experts releases (DeepSeek-V4, Cohere2Moe) are the plausible candidates, and even then the deciding factor is tool-calling reliability and context length, not benchmark scores. The honest path is to wire one in through a Custom API Endpoint and watch it run a real multi-step task before trusting it.

How did this page land for you?

React to reveal totals

Comments ()

Leave a comment to see what others are saying.

Public and anonymous. No signup.