The latest AI news from Hugging Face, GitHub, and arXiv, May 2026

A primary-source roundup of what actually shipped in the first week of the month, then the on-disk pipeline Fazm bundles that lets you reproduce this query on your own Mac with citations that resolve to a DOI or a GitHub tag, not to another aggregator.

M
Matthew Diakonov
11 min read

Direct answer (verified 2026-05-08)

Three sources, three concrete drops to know about. On Hugging Face, Transformers v5.8.0 was tagged on May 5, 2026 and added five model architectures in one release: DeepSeek-V4, Gemma 4 Assistant, GraniteSpeechPlus, Granite4Vision, and EXAONE-4.5. On arXiv, the May 8 cs.AI listing alone surfaced over a thousand papers including MASPO (2605.06623, ICML 2026), SkillOS (2605.06614), and AI Co-Mathematician (2605.06651). On GitHub, trending continued to be led by OpenClaw at roughly 280k stars, with n8n near 162k and LangChain over 100k. Sections below link to each primary source.

The shape of the answer

Three primary sources beat one aggregator

Most pages that surface for this question are scraped lists. Top 5 trending repos this week. Top Hugging Face models in May. Top arXiv papers. They all link to each other before they link to the source. The drop happens on a tag, a model card, an arXiv ID, and the aggregator quotes the post one day later. If you wanted yesterday, that is fine. If you want the actual artifact, every fact below points at a primary identifier.

The second half of this guide is the part you will not find on those lists, the local pipeline Fazm bundles for asking the same question of your own machine, with verification that catches fabricated citations before the report is written.

Hugging Face, May 5, 2026

Transformers v5.8.0 added five model architectures in one tag: DeepSeek-V4 (1.6T / 49B activated for Pro, 284B / 13B for Flash, hybrid local plus long-range attention), Gemma 4 Assistant (multi-token-prediction speculative decoder for Gemma 4), GraniteSpeechPlus (multimodal STT with speaker annotation), Granite4Vision (IBM Research enterprise VLM for documents and charts), EXAONE-4.5 (LG AI Research, 33B, 256K context). PP-FormulaNet was also added.

arXiv cs.AI, week of May 4 to May 8

Notable IDs from the May 8 listing alone: 2605.06651 (AI Co-Mathematician), 2605.06623 (MASPO multi-agent prompt optimization, ICML 2026), 2605.06614 (SkillOS for self-evolving agents), 2605.06638 (Can RL Teach Long-Horizon Reasoning, expressiveness as the key variable). The cs.AI listing held more than a thousand papers for that single day.

GitHub trending, early May 2026

OpenClaw at roughly 280k stars, n8n near 162k, LangChain over 100k. Breakouts the same week, pi-mono (agent toolkit, coding CLI, unified LLM API, TUI, web UI, Slack bot) and Pixelle-Video (text-to-finished-video with script, visuals, voice, music).

Hugging Face

Transformers v5.8.0, five model architectures in one tag

The May 5, 2026 release is the cleanest single moment to point at if you want a calendar marker for what changed on Hugging Face this month. Five model architectures landed in the same minor version.

  • DeepSeek-V4. A next-generation Mixture of Experts pair, Pro at 1.6T total parameters with 49B activated, Flash at 284B with 13B activated. Both target a one million token context. The architectural changes worth the headline are a hybrid local plus long-range attention pattern in place of Multi-head Latent Attention, manifold-constrained hyper-connections in place of plain residuals, and a static token-id to expert-id hash table that warm-starts the first MoE layers. Model cards live at deepseek-ai/DeepSeek-V4-Pro and deepseek-ai/DeepSeek-V4-Flash.
  • Gemma 4 Assistant. A small text-only assistant model that enables speculative decoding for the larger Gemma 4 family using Multi-Token Prediction (MTP) with KV cache sharing. The pitch is throughput, the assistant proposes tokens and the target accepts or rejects, the cache is shared so cross-attention is fed by the target's context.
  • GraniteSpeechPlus. A multimodal speech-to-text model that improves on the prior Granite Speech projector by consuming concatenated encoder states. It does word-level timestamps and speaker annotation in response to text prompts.
  • Granite4Vision. IBM Research's enterprise vision-language model focused specifically on document data extraction, chart analysis, and table recognition. The framing is enterprise document pipelines, not general image understanding.
  • EXAONE-4.5. The first open-weight vision language model from LG AI Research, 33B total parameters, 256K context window. Open weights matter here because the prior EXAONE generations were not openly released.

PP-FormulaNet for lightweight table structure recognition also shipped in the same release. The full release notes are on the Transformers releases page on GitHub, linked above. Treat that page as the primary source. If a third party article disagrees with what is in the release notes, the release notes win.

arXiv

cs.AI, week of May 4 to May 8, what stood out

The cs.AI category on arXiv held more than a thousand papers for May 8, 2026 alone. The Computer Science listing for the same day held many times that. No human reads all of it, and no aggregator covers it without a pre-filter. A few IDs from that single day are worth pinning, because they are the kind of paper you will see cited two months from now.

  • 2605.06623, MASPO. Joint prompt optimization for LLM-based multi-agent systems, accepted at ICML 2026. The contribution is optimising prompts across agents jointly instead of one agent at a time, which the paper argues is the right level of abstraction for any pipeline where agents exchange context.
  • 2605.06614, SkillOS. Learning skill curation for self-evolving agents. The framing is what to keep and what to drop as an agent accumulates skills over time, treating the skill set as a curated index rather than a flat append-only list. Worth reading next to any production agent that loads skills from a directory.
  • 2605.06638. Can RL teach long-horizon reasoning to LLMs? The paper's argument is that expressiveness, not scale alone, is the limiting variable. Useful counterweight to any 'just train a bigger model' framing.
  • 2605.06651, AI Co-Mathematician. Accelerating mathematicians with agentic AI. A concrete domain study of agentic systems applied to research mathematics, useful as a grounding for agentic-AI claims that otherwise live at the level of marketing.

These four are not 'the four important papers of the week'. They are four IDs from one day's listing that are easy to point at. The full daily listing is at arxiv.org/list/cs.AI/recent. If you want the canonical version of the cited results in this section, paste each ID into the arXiv search box and read the abstract directly.

GitHub

Trending in the first week of May, in star order

GitHub trending is noisy because it weights stars-per-day, not totals. The headline numbers below are total stars, which is the more honest measure of where the gravity has actually moved this year.

  • OpenClaw, around 280k stars. A personal AI assistant that runs on your own devices and acts as a local gateway between AI models and roughly 50 integrations. The growth curve is the actual story, OpenClaw is reportedly the fastest-growing open source project in GitHub history, and trending pages list it almost every week of 2026.
  • n8n, around 162k stars. The visual workflow automation platform, fair-code self-hosted, with native AI capabilities and more than 400 integrations. The May cadence is steady rather than headline-grabbing, but the project has been quietly absorbing the slot that no-code workflow tools used to have.
  • LangChain, over 100k stars. Still the foundational framework for building Python LLM agents, modular components for chains, agents, memory, retrieval, tool use, multi-agent orchestration. New model releases like the Transformers v5.8.0 batch land here as adapters within days.
  • pi-mono. An AI agent toolkit that bundles a coding agent CLI, a unified LLM API, TUI and web UI libraries, a Slack bot, and vLLM pods. The interesting choice is bundling, the same toolkit covers terminal-shaped use, app-shaped use, and chat-shaped use without making the user wire three stacks.
  • Pixelle-Video. Type a topic, get a finished video, automated script writing, AI-generated visuals, voice synthesis, background music, final composition. The category is crowded, what kept this one trending in early May was the end-to-end finished output rather than partial pipelines.

The local pipeline

The on-disk anatomy of the deep-research skill

This is the part you will not find on a roundup page. When Fazm asks the question 'what shipped on Hugging Face, GitHub, and arXiv in May 2026', it does not call a hosted research API. It loads a bundled skill from disk and runs it as a multi-phase pipeline on your machine. The skill file is large enough and prescriptive enough that the behaviour is auditable. You can read it.

0Lines in deep-research.skill.md
0Total lines across all bundled skills
0Other skills that ship in the bundle
0Structural checks in validate_report.py
0Min sources before a report is allowed to ship
0Fazm releases in the first 8 days of May 2026

Where the file lives, and what it shares the directory with

Inside the Fazm app bundle, at Desktop/Sources/BundledSkills/deep-research.skill.md. Sixteen sibling skills share the same directory and total 4,523 lines, deep-research alone is 856 of those. The other skills cover ai-browser-profile, canvas-design, doc-coauthoring, docx, find-skills, frontend-design, google-workspace-setup, pdf, pptx, social-autoposter, social-autoposter-setup, telegram, travel-planner, video-edit, web-scraping, and xlsx. SkillInstaller.swift auto-discovers any *.skill.md file in the bundle's Resources/BundledSkills directory at launch, computes a SHA-256 over each file, compares against the existing copy at ~/.claude/skills/<name>/SKILL.md and copies the bundled version forward when the checksum has changed. No App Store update is required for skill content changes.

Anchor fact

deep-research.skill.md ships at 856 lines and is paired with two scripts in the same skill: scripts/verify_citations.py resolves DOIs against doi.org and matches title and year metadata, scripts/validate_report.py runs eight structural checks before a report is allowed to ship. An invented arXiv citation does not have a real DOI; an arXiv ID typo resolves to a different paper or fails resolution outright. The pipeline catches both before the user ever sees the report. That is what makes the local roundup reproducible instead of plausible.

The 8 phases in plain language

The skill enforces an explicit 8-phase pipeline, SCOPE, PLAN, RETRIEVE, TRIANGULATE, SYNTHESIZE, CRITIQUE, REFINE, PACKAGE. SCOPE turns 'latest AI news Hugging Face GitHub arXiv May 2026' into a defined topic with explicit boundaries and an output expectation. PLAN selects the mode (Quick, Standard, Deep, UltraDeep) based on recognised phrasing, the default for trend queries is Standard, 5 to 10 minutes, 15 to 30 sources, 4,000+ words. RETRIEVE issues 5 to 10 parallel WebSearches in a single message and spawns 3 to 5 parallel Task agents. TRIANGULATE requires three or more sources per claim. SYNTHESIZE produces a draft. CRITIQUE inspects for known failure modes, missing counterevidence, recency mismatch, over-confident inference. REFINE rewrites the parts that fail critique. PACKAGE writes the markdown, McKinsey-style HTML, and PDF into a dated folder under your Documents.

The parallel-execution rule in Phase 3 is loud on purpose. The skill literally marks the wrong pattern (sequential WebSearch number 1, wait, WebSearch number 2, wait) with a WRONG label. For a moving topic like daily AI news the difference between five minutes and thirty minutes is the difference between a useful report and a stale one.

How verification actually rejects a bad citation

Once the draft exists, the Verify step runs two scripts in order. The first is scripts/verify_citations.py. It opens every numbered citation in the bibliography, attempts DOI resolution against doi.org, matches the reported title and year against the resolved metadata, and flags any 2024 or later citation that has no DOI, no URL, or fails resolution. The fabricated 'plausible-looking' arXiv paper that an LLM could otherwise hallucinate fails this step; so does the typo where 2605.06614 silently becomes 2606.06614 and resolves to the wrong paper. The second script is scripts/validate_report.py and runs eight structural checks: executive summary length 50 to 250 words, required sections present, citations formatted as [1] [2] [3], bibliography entries match in-text citations, no placeholder text, word count between 500 and 10,000, minimum 10 sources, no broken internal links. If validation fails, the skill auto-fixes once, attempts a manual correction pass once, and after two failures it stops and reports the issues instead of shipping a flawed report.

That two-stage check is the part of the pipeline an aggregator article cannot match. An aggregator does not resolve its citations, the cost would be prohibitive against the page-publishing tempo. A local pipeline can afford the verification because it is one report on one machine, and you would rather wait an extra minute than read a fabricated arXiv ID.

Day-zero coverage of new model releases

When Transformers v5.8.0 added DeepSeek-V4, Gemma 4 Assistant, GraniteSpeechPlus, Granite4Vision, and EXAONE-4.5 in the same tag, no Fazm release was needed for the new model names to surface in the picker. The dynamic model list shipped in Fazm 2.4.0 on April 20, 2026 turned the available-model picker from a static array baked into the binary into a list emitted live by the bridge subprocess. A new model name surfaces in the picker on the next session, with no App Store push. The 2.4.2 patch on April 26 then shipped a seven-line Swift function that silently migrated the persisted picker choice across an upstream rename, so users who had selected the older alias kept their choice.

The same pattern shows in the May 2026 changelog, eight releases in eight days. 2.7.1 on May 1 added an experimental Codex backend (Settings, Advanced, Codex Backend) that routes GPT-5 family models through OpenAI's Codex CLI on a ChatGPT subscription, with the same MCP servers as Claude (fazm_tools, Playwright, macos-use, WhatsApp, Google Workspace) plus session resume across app restarts. 2.9.0 on May 7 cut CPU usage during chat streaming by restructuring floating-bar state into focused stores so a single response no longer pegs a core. 2.9.1 on May 8 fixed a chat-database merge path that could destroy legacy data on failure, the new behaviour moves failed sources into a quarantine folder. The whole month is in CHANGELOG.json in the public repo.

What the report actually looks like on disk

Three files share a base name, written into a dedicated folder under your Documents directory. For 'AI news Hugging Face GitHub arXiv May 2026' the folder is named ~/Documents/AI_News_HF_GitHub_arXiv_May2026_Research_20260508/. Inside it you get research_report_20260508_ai_news_may2026.md (the primary source of truth), research_report_20260508_ai_news_may2026.html (rendered with the McKinsey-style template at ./templates/mckinsey_report_template.html, sharp corners, navy and gray, dense data tables, every citation wrapped in a tooltip span with source details), and research_report_20260508_ai_news_may2026.pdf (printable). The HTML and PDF open automatically in their default applications. A copy of the markdown is also placed at ~/.claude/research_output/ for internal tracking.

Every artifact is on the local filesystem. Every citation resolves. Every claim is checkable. The HTML template is opinionated, McKinsey-style on purpose, dense over decorative, table-first, executive-summary first, the full text below. You can hand it to a colleague and they will not be skeptical that an LLM wrote it because the format is built to be skimmed and verified, not admired.

Want a 20 minute walkthrough of running this on your own machine?

I will run a real deep-research query against your topic of choice on a shared screen, show you where the verifier rejects bad citations, and hand you the report files at the end.

Frequently asked questions

What concretely shipped on Hugging Face, GitHub, and arXiv in the first week of May 2026?

Three primary sources, three concrete drops. Hugging Face: Transformers v5.8.0 was tagged on May 5, 2026 and added five model architectures in one release, DeepSeek-V4 (the next-generation MoE family with 1.6T total / 49B activated parameters in Pro and 284B / 13B in Flash), Gemma 4 Assistant (a small text-only speculative-decoding partner for Gemma 4 using multi-token prediction), GraniteSpeechPlus (a multimodal speech-to-text model with speaker annotation), Granite4Vision (IBM Research's enterprise document-and-chart VLM), and EXAONE-4.5 (LG AI Research's first open-weight VLM at 33B with 256K context). PP-FormulaNet for table structure recognition was added in the same release. arXiv: the cs.AI submissions on May 8, 2026 alone included AI Co-Mathematician (2605.06651), MASPO joint prompt optimization for multi-agent systems (2605.06623, accepted at ICML 2026), SkillOS for self-evolving agents (2605.06614), and 'Can RL Teach Long-Horizon Reasoning to LLMs?' (2605.06638). The full daily AI listing for that single day held over a thousand papers. GitHub: OpenClaw crossed roughly 280k stars and continued to dominate trending, n8n sat near 162k, and LangChain was over 100k. The breakouts the same week were pi-mono (an agent toolkit with a coding CLI, unified LLM API, TUI, web UI, and Slack bot) and Pixelle-Video (text-to-finished-video pipeline with script, visuals, voice, and music).

Why is the deep-research skill bundled with Fazm 856 lines long when the other bundled skills are 50 to 500?

Because it is the only one that ships a full citation-verification harness, an 8-phase pipeline, three output formats (Markdown, McKinsey-style HTML, PDF), and hard rules on parallel execution and source grounding. The other bundled skills do one job: ai-browser-profile.skill.md is 47 lines, find-skills.skill.md is 148 lines, social-autoposter.skill.md is 302 lines, travel-planner.skill.md is 534 lines. deep-research at 856 lines is roughly 19 percent of the entire bundled-skills payload (4,523 lines total). The size is what makes the output trustworthy on a moving target: line by line, the skill encodes 'Every factual claim MUST cite a specific source immediately', 'Verify before citing: if unsure whether source actually says X, do NOT fabricate citation', and 'When uncertain: Say No sources found for X rather than inventing references'. It also enforces parallelism explicitly, the wrong pattern (sequential WebSearch calls) is marked WRONG in the skill itself.

How does the verification step actually catch a fabricated arXiv citation?

Two scripts run in sequence after the report draft is written. First, scripts/verify_citations.py opens every numbered citation in the bibliography, attempts DOI resolution against doi.org, matches the reported title and year against the resolved metadata, and flags any 2024-or-later citation that has no DOI, no URL, or fails resolution. An invented arXiv paper does not have a real DOI; an arXiv ID typo (2605.06614 fat-fingered to 2606.06614) resolves to a different paper or fails to resolve, and the title-year match catches the difference. Second, scripts/validate_report.py runs eight structural checks: executive summary length 50 to 250 words, required sections present, citations formatted as [1] [2] [3], bibliography entries match in-text citations, no placeholder text (TBD, TODO), word count between 500 and 10,000, minimum 10 sources, no broken internal links. If validation fails, the skill auto-fixes once, attempts a manual correction pass once, and after two failures it stops and reports the issues instead of shipping a flawed report.

Why pull from Hugging Face, GitHub, and arXiv specifically instead of one aggregator newsletter?

Because the three are the actual primary sources and an aggregator is at best a one-day-old summary. A model on Hugging Face is the model. A pushed tag on github.com/huggingface/transformers is the release moment, not a write-up about it. An arXiv ID resolves to the paper, not to a paragraph about the paper. Every aggregator that ranks for this topic is rewriting the same five trending posts, and they all link to each other before they link to the source. The deep-research skill in Fazm is set up to do the opposite, decompose the query into 5 to 10 search angles, launch them in parallel, triangulate against three or more sources per claim, and reject any 2024-or-later claim that does not resolve to a primary identifier (DOI, GitHub release tag, model card URL).

What does the parallel-search rule look like inside the bundled skill?

Phase 3 RETRIEVE is labelled Mandatory Parallel Search. The skill tells the agent to decompose the query into 5 to 10 independent search angles before issuing any searches, then launch all of them in a single message with multiple tool calls, then spawn 3 to 5 parallel Task agents for deep-dive investigations. The wrong pattern (WebSearch number 1, wait, WebSearch number 2, wait) is shown explicitly with a WRONG marker. For the May 2026 multi-source roundup that parallelism is the difference between a five-minute report and a thirty-minute report. On a daily news cadence the gap is the difference between useful and stale.

Where does the report end up on disk and what files do I get?

In a topic folder under your Documents directory, named ~/Documents/[TopicName]_Research_[YYYYMMDD]/. For 'AI news Hugging Face GitHub arXiv May 2026' you get a folder like ~/Documents/AI_News_HF_GitHub_arXiv_May2026_Research_20260508/. Inside it, three files share a base name: research_report_20260508_ai_news_may2026.md (the primary source of truth), research_report_20260508_ai_news_may2026.html (rendered with the McKinsey-style template at ./templates/mckinsey_report_template.html, with sharp corners, navy and gray, dense data tables, citations wrapped in tooltip spans), and research_report_20260508_ai_news_may2026.pdf (printable). The HTML and PDF open automatically in their default applications. A copy of the markdown is also placed at ~/.claude/research_output/ for internal tracking. Every artifact is on the local filesystem, every citation is resolvable, every claim is checkable.

How does Fazm pick up brand new models like the Transformers v5.8.0 batch on day zero?

Through the dynamic model list shipped in Fazm 2.4.0 on April 20, 2026. Before 2.4.0 the available-model picker was a static array baked into the binary and any new model needed an App Store push. After 2.4.0 the bridge subprocess emits the live model list from the agent SDK, the Swift handler ingests it on the next session, and a new model name surfaces in the picker without an app update. So when a model like DeepSeek-V4 lands in Transformers and a routing layer makes it accessible, Fazm can route through it without a release. The 2.4.2 patch on April 26 then shipped a seven-line Swift function that silently migrated the persisted 'opus' UserDefaults string to a new ACP alias, preserving each user's last picker choice across the rename. Two patches in six days, both visible in CHANGELOG.json, both transparent to the user.

What is in the May 2026 Fazm changelog itself, in case the question is really 'how is Fazm moving this month'?

Eight releases in eight days. 2.7.1 on May 1 restored the on-screen 'Fazm is controlling your browser' indicator, kept the AI chat alive when the user clicked another window during tool calls, surfaced Heal cards under Discovered Tasks (a Mac Doctor system prompt that runs read-only diagnostics first), and shipped an experimental Codex backend (Settings, Advanced, Codex Backend) that routes GPT-5 family models through OpenAI's Codex CLI on a ChatGPT subscription. 2.7.2 on May 2 fixed pop-out chat recovery losing context after a second rate limit and Chrome extension detection in Playwright MCP for non-default profiles. 2.7.3 on May 3 fixed a poisoned SQLite connection that silently failed chat saves. 2.8.0 on May 4 removed a 60 second auto-cancel that was killing slow Opus and extended-thinking responses mid-think. 2.8.1 and 2.8.2 on May 5 fixed cross-scope chat contamination and a Stop-button-stuck case after hung tools. 2.9.0 on May 7 massively reduced CPU usage during chat streaming by restructuring floating-bar state into per-message granular observation. 2.9.1 on May 8 stopped a data-loss path where a failed chat database merge would still delete the legacy source.

Does the deep-research pipeline replace running a desktop agent, or is it part of one?

It is part of one. The same Mac process that drives your browser, edits documents, and operates Google Apps via accessibility APIs is what loads the deep-research skill, opens 5 to 10 parallel WebSearches, fans out 3 to 5 Task agents, runs DOI verification, and writes the three output files into your Documents folder. There is no separate cloud research service in the loop. That is also why the verification step is feasible at all, the DOI resolution and 8-check validator are running locally, can be inspected, and can be modified by the user. A hosted aggregator's research model has no equivalent step you can audit.

Do I need a developer account, an API key, or a CLI to use the pipeline?

No. Fazm is a Mac app, not a developer framework. The bundled skill, the verifier scripts, and the McKinsey HTML template are auto-installed on first launch, the SkillInstaller looks at every bundled .skill.md file, computes a SHA-256, compares against the existing copy at ~/.claude/skills/[name]/SKILL.md, and copies the bundled version forward when the checksum has changed. End users never touch the terminal. Developers who want to inspect the skill can read the file at /Users/[username]/.claude/skills/deep-research/SKILL.md after first launch.