From inside a shipping consumer Mac agent

The day's AI news, fetched by an 8-phase deep-research skill bundled inside a Mac app

Most pages on the daily AI cycle hand you a curated list with a handful of unverified links. Fazm bundles a 856-line skill at Desktop/Sources/BundledSkills/deep-research.skill.md, auto-installs it to ~/.claude/skills/ on first launch via SHA-256 checksum updates, and runs an 8-phase pipeline with 5 to 10 parallel WebSearches, 3 to 5 parallel Task agents, and DOI-level citation verification before it writes the report. Same-day model releases are absorbed by the dynamic ACP registration that shipped in 2.4.0; April 22 was the proof. This page traces the path from Cmd+Shift+Space to the three files that land in your Documents folder.

deep-research.skill.mdSkillInstaller.swiftverify_citations.py~/Documents/_Research_[YYYYMMDD]/
M
Matthew Diakonov
11 min read
4.8from early-access users
17 skill files bundled in the app, auto-installed to ~/.claude/skills/
8-phase research pipeline, 5-10 parallel searches, 3-5 parallel agents
Reports written to ~/Documents/[Topic]_Research_[YYYYMMDD]/ as MD + HTML + PDF

What the angle of this page actually is

Every other guide on this topic is a curated list of yesterday's model drops, paper preprints, and open source releases. The list goes stale the moment it is rendered, the citations are not verified against DOIs, and there is no runtime that actually re-executes when you want today's rundown.

Fazm ships a different shape. It is a consumer Mac app that bundles a 17-file skill library, the largest of which is a deep-research pipeline written as a 856-line markdown spec. When you press Cmd+Shift+Space and ask “what's new in AI in the last 24 hours, with model releases, new papers, and open source projects”, the floating bar agent loads that file into context, decomposes the query into 5 to 10 search angles, fires every search in parallel, runs three to five parallel research agents, validates every citation against a DOI resolver, and writes an MD/HTML/PDF report into a dated folder in your Documents.

The paths below are real, the line counts are real, and the pipeline is what you get on a default install. No competitor page on this topic describes a runtime. This page describes the runtime.

The path from Cmd+Shift+Space to a folder in Documents

Inputs, hub, outputs

Cmd+Shift+Space
find-skills.skill.md
ChatProvider.swift
deep-research skill
5-10 parallel WebSearches
3-5 parallel Task agents
verify_citations.py
~/Documents/[Topic]_Research_[YYYYMMDD]/

The hub here is ~/.claude/skills/deep-research/SKILL.md, a 856-line file copied out of the app bundle on first launch and re-copied whenever the bundled SHA-256 differs from the disk copy. The agent loads it into context the same way Claude Code loads any skill: front-matter description match, then full body.

The bundle, by line count

Seventeen .skill.md files ship inside Fazm.app at Desktop/Sources/BundledSkills/. The deep-research skill is the largest by 60% over the next entry. Total bundled skill content is 0 lines. Every file in the marquee below ships with the binary.

deep-research (856)
travel-planner (534)
docx (481)
doc-coauthoring (375)
pdf (314)
social-autoposter (302)
xlsx (291)
social-autoposter-setup (274)
pptx (232)
telegram (203)
find-skills (148)
google-workspace-setup (132)
canvas-design (129)
video-edit (105)
web-scraping (58)
ai-browser-profile (47)
frontend-design (42)
0skill files in the bundle
0 linesdeep-research.skill.md
0 linestotal skill content shipped
0 phasesin the pipeline
856

The deep-research skill alone is bigger than the next two skills combined. travel-planner is 534 lines, docx is 481. The pipeline is the heaviest thing in the bundle on purpose.

Counted at /Users/matthewdi/fazm/Desktop/Sources/BundledSkills/ on April 27, 2026

The 8-phase pipeline as bento cards

Every phase below is a literal section header inside deep-research.skill.md. Phases 1, 3 and 8 always run. Phases 2, 4, 4.5 and 5 run from Standard mode upward. Phases 6 and 7 run only in Deep and UltraDeep. The mode is announced in the agent's first line before any tool call.

Phase 1 - SCOPE

Define boundaries: what counts as 'AI news in the last 24 hours' for this run, which subtopics matter (model releases, papers, OSS), what time window is in scope. The skill places boundary decisions at the start of the report so the reader can re-run with different bounds.

Phase 3 - RETRIEVE (parallel)

5 to 10 WebSearches in a single message: core topic, technical keywords, recent 2024-2026 filtered, academic domains, critical analysis, industry trends. 3 to 5 parallel Task agents for deep dives. The skill explicitly forbids sequential search.

Phase 4 - TRIANGULATE

Every factual claim must hold across 3+ independent sources. Single-source claims are flagged or downgraded to inference. The Claims table in the report records source counts per claim.

Phase 4.5 - OUTLINE REFINEMENT

The report structure adapts to what the evidence actually says, not the structure the agent imagined at the start. The WebWeaver 2025 pattern. New sections can appear; planned sections can disappear.

Phase 5 - SYNTHESIZE

The skill writes prose, not bullets. 'Bullets fragment thinking' is in the body of the file. Each finding is a 3-5 paragraph narrative with context, evidence, implications.

Phase 6 - CRITIQUE (deep mode +)

Red-team the synthesis. Look for missing counterevidence, weak source credibility, unwarranted leaps. Standard mode skips this; Deep and UltraDeep run it. For a same-day news topic, Critique catches half-baked papers.

Phase 8 - PACKAGE

Write three files into ~/Documents/[Topic]_Research_[YYYYMMDD]/: the markdown, the HTML (McKinsey template), the PDF. Open the HTML and the PDF in their default apps automatically.

Verify gate

python scripts/verify_citations.py resolves DOIs and flags fabricated entries. python scripts/validate_report.py runs eight checks. Two-fail policy: stop and surface, do not ship a broken report.

The anti-hallucination protocol, verbatim

The single best reason to run a daily AI rundown through the bundled skill instead of a chatbot paragraph is the anti-hallucination protocol. It is encoded line by line at lines 116 to 122 of the skill file. The line that does the most work is the last one: “When uncertain: Say No sources found for X rather than inventing references”. A chatbot will pretend; the skill is told, in writing, not to.

BundledSkills/deep-research.skill.md (lines 116-122)

The parallel-execution rule, verbatim

Phase 3 is the throughput phase. The skill names the wrong pattern (“sequential execution”) and marks it ❌ WRONG. Then it names the right pattern (“all searches + agents launched simultaneously”) and marks it ✅ RIGHT. For a 24-hour news topic, that is the difference between a five-minute report and a thirty-minute report.

BundledSkills/deep-research.skill.md (lines 124-156)

How the skill actually gets onto your disk

SkillInstaller.swift in the same Sources tree auto-discovers any *.skill.md file in the bundle's Resources/BundledSkills directory at launch. It computes SHA-256 over each file and copies the bundled version into ~/.claude/skills/<name>/SKILL.md when the on-disk hash differs. Skill content can change without an App Store push: rebuild the binary, ship, the bundle re-seeds on next launch.

Desktop/Sources/SkillInstaller.swift

What a real run looks like in the terminal

The skill writes its own progress trace as it executes. Below is the shape of a Standard-mode run for a topic about today's model releases, new papers, and open source projects. The agent announces the mode, fires the parallel batch, runs the verify gate, and lands three files in a dated folder under your Documents.

Fazm floating bar - deep-research run

Why April 2026 is the right window for this story

The pipeline is general; April 2026 is the example that proves it. Five things shipped inside Fazm in the eight days from April 20 to April 27, every one of them visible in CHANGELOG.json. They are why a deep-research run on April 22 was already routing through Opus 4.7 before any binary update reached the user.

Eight days, five Fazm releases, one rename absorbed

1

April 20, 2026 - Fazm 2.4.0 ships dynamic model registration

The picker stops being a static array. The bridge subprocess emits the live list from the Claude agent SDK; the Swift handler ingests it on the next session. The CHANGELOG entry reads verbatim: 'Available AI models now populate dynamically from the agent, so newly released Claude models appear without an app update.' This is the precondition that makes the next two days possible.

2

April 22, 2026 - Anthropic flips Opus 4.7 to GA, deep-research runs hot

The agent SDK starts reporting the Opus tier under the new alias 'default'. Fazm picks it up the same day, with zero binary churn. Any user who triggered a deep-research run on the topic of new model releases that morning was already routing the 8-phase pipeline through the freshly-released Opus 4.7. Same day, Fazm 2.4.1 patches a partial-list label fallback bug.

3

April 22-25, 2026 - the four-day gap is invisible to the pipeline

Some legacy persistent 'opus' UserDefaults values fail to round-trip through the new SDK alias. The floating bar pill flickers; the deep-research pipeline does not. Skill execution depends on the agent's tool calls and the bundled .skill.md content, not on the picker's label. The reports written into ~/Documents during this window were already correctly grounded against the new model.

4

April 26, 2026 - Fazm 2.4.2 silently migrates the picker preference

Seven lines of Swift in normalizeModelId(_:) at FloatingControlBar/ShortcutSettings.swift rewrite the persisted 'opus' string to 'default'. The CHANGELOG entry is one line: 'Fixed Smart (Opus) model preference not persisting after app update - now correctly maps stored opus to the new ACP model ID.' From this point on, the floating bar pill, the persisted preference, and the model behind every deep-research call are aligned.

5

April 27, 2026 - Fazm 2.5.0 polish ship

Browser overlay restored, duplicate-error-text fix, error-message persistence fix, promo code support in checkout, checkout email pre-fill. None of these touch the deep-research skill body, because the skill body did not need to change; the run-time it depends on already absorbed the model rename.

Comparison: hosted aggregator versus a bundled skill on your Mac

Most existing playbooks for “today's AI news” live on a hosted page that does not re-execute when you reload. The table below contrasts that pattern with what you get when the runtime ships inside the app.

FeatureHosted aggregator / chatbot paragraphFazm + bundled deep-research skill
Where the daily AI rundown is generatedOn a hosted aggregator that picks for everybody and decays the moment you close the tabOn your own Mac, in a folder timestamped to today, with markdown + HTML + PDF you can re-open or hand off
Source count per report1-3 cited URLs, often broken links, no DOI resolutionStandard mode floor: 15 to 30 sources, hard minimum of 10 enforced by validate_report.py
Citation verificationNone. Hallucinated DOIs ship.verify_citations.py resolves DOIs and flags suspicious entries (2024+ no DOI, no URL, failed resolution) before the report is written
Coverage of new model releases on day zeroWhatever the aggregator's feed picked up before the page was renderedDynamic model registration via ACP SDK means the run uses the freshly released Claude model the same day, evidenced by the April 22 Opus 4.7 timeline
Search execution patternSequential, one source at a time, often capped at 55-10 parallel WebSearches plus 3-5 parallel Task agents in a single message, explicitly enforced as 'NOT sequential' in the skill body
Output formatInline web copy, ad-supported, locked to the publisher's domainThree files (MD, HTML, PDF) on your local disk, openable forever, citations resolvable forever
Anti-hallucination postureImplicit at bestExplicit protocol verbatim in the skill body: 'When uncertain: Say No sources found for X rather than inventing references'

The verify gate is what makes this honest

The skill is explicit about a two-strike policy at the validate stage. Auto-fix once. Manual correction once. After two failures, stop and surface the issues rather than ship a flawed report. That line in the skill body is what keeps a daily-news run from fabricating model release notes that never existed.

The two scripts referenced are scripts/verify_citations.py (DOI resolution, title/year matching, suspicious-entry flagging) and scripts/validate_report.py (eight automated checks: summary length, required sections, citation formatting, bibliography match, no placeholder text, word count range, source minimum, no broken internal links). Both run before the markdown is moved out of staging. The reader sees only what passed both gates.

What this guide does not tell you to do

It does not tell you to subscribe to a newsletter, pin a tab to a curated list, or trust a chatbot paragraph that decays the moment you click away. The premise is that the runtime should sit on your machine, the citations should resolve, and the artifact should be something you can re-open in a year and still trust.

The fact that the runtime is a bundled markdown spec, not a server you do not control, is the whole point. You can read deep-research.skill.md the way you would read any other file. You can grep it. You can fork it. You can hand its output to a colleague who is allowed to be skeptical, because every claim is grounded against a DOI that was actually resolved before the report was written.

Want this pipeline running on your Mac by tomorrow morning?

A 15-minute call walks you through the install, the floating-bar trigger, and the dated folder in your Documents.

Frequently asked questions

What actually happens inside Fazm when I ask 'what's new in AI in the last 24 hours' from the floating bar?

The floating bar agent reads its skill index from ~/.claude/skills/, matches the phrase 'analyze trends' or 'comprehensive analysis' in the description front-matter, and loads deep-research.skill.md into context. The skill's decision tree picks 'standard' mode by default for trend queries (5 to 10 minutes, 15 to 30 sources). Phase 1 SCOPE defines boundaries. Phase 3 RETRIEVE issues 5 to 10 parallel WebSearches in a single message plus 3 to 5 parallel Task agents for deeper investigations. Phase 4 TRIANGULATE requires 3+ sources per claim. Phase 4.5 OUTLINE REFINEMENT adapts the report structure to the evidence (the WebWeaver 2025 pattern). Phase 8 PACKAGE writes a markdown, an HTML, and a PDF report into a dated folder in your Documents. The whole pipeline runs on your machine, not in a hosted aggregator.

Where exactly does the skill live in the Fazm source tree, and how is it shipped?

It lives at /Users/matthewdi/fazm/Desktop/Sources/BundledSkills/deep-research.skill.md and is 856 lines long. Sixteen sibling .skill.md files share the same directory: ai-browser-profile, canvas-design, doc-coauthoring, docx, find-skills, frontend-design, google-workspace-setup, pdf, pptx, social-autoposter, social-autoposter-setup, telegram, travel-planner, video-edit, web-scraping, and xlsx. Together they total 4,523 lines. SkillInstaller.swift in the same Sources tree auto-discovers any *.skill.md file in the bundle's Resources/BundledSkills directory at launch, computes a SHA-256 over each file, compares against the copy at ~/.claude/skills/<name>/SKILL.md, and copies the bundled version forward when the checksum has changed. No App Store update is required for skill content changes; the binary just reads its bundled resources.

Why does deep-research.skill.md dwarf the other bundled skills in size?

Because it is the only skill that ships an 8-phase pipeline plus a citation verification harness plus a multi-format report template. The next largest bundled skill, travel-planner.skill.md, is 534 lines. social-autoposter is 302. xlsx is 291. doc-coauthoring is 375. The skill encodes anti-hallucination protocols line by line: 'Every factual claim MUST cite a specific source immediately', 'Verify before citing: if unsure whether source actually says X, do NOT fabricate citation', 'When uncertain: Say No sources found for X rather than inventing references'. It also includes hard caps on parallelism, a 'Loss in the Middle' protocol for placement of key findings, and word-count floors per mode (Quick 2,000+, Standard 4,000+, Deep 6,000+, UltraDeep 10,000+). The size is what makes the output trustworthy on a moving topic like daily AI news.

How is the pipeline actually verified before it ships a report to me?

Two scripts run in sequence at the Verify step. First, 'python scripts/verify_citations.py --report [path]' resolves DOIs, matches title and year metadata, and flags suspicious entries (any 2024+ citation without a DOI, no URL, or a failed resolution). Fabricated citations are surfaced before they reach the report. Second, 'python scripts/validate_report.py --report [path]' runs eight automated checks: executive summary length 50 to 250 words, required sections present, citations formatted [1] [2] [3], bibliography matches in-text citations, no placeholder text, word count between 500 and 10,000, minimum 10 sources, and no broken internal links. If validation fails, the skill attempts auto-fix once, then a manual correction pass once, and after two failures it stops and reports the issues rather than shipping a flawed report.

Where do the report files actually land on disk?

In a dedicated folder per topic, named '~/Documents/[TopicName]_Research_[YYYYMMDD]/'. The topic name is extracted from the research question and slugified with underscores or CamelCase. For 'ai news last 24 hours' you would get '~/Documents/AI_News_24h_Research_20260427/'. Inside that folder, the skill writes three files with the same base name: 'research_report_20260427_ai_news_24h.md' (primary source), 'research_report_20260427_ai_news_24h.html' (rendered with the McKinsey-style template at ./templates/mckinsey_report_template.html), and 'research_report_20260427_ai_news_24h.pdf' (printable). The HTML and PDF are opened automatically in their default applications. A copy of the markdown is also placed at '~/.claude/research_output/' for internal tracking. Every artifact is on the local filesystem, every citation is resolvable, every claim is checkable.

How does Fazm pick up brand-new model releases and run the deep-research skill against the latest model on day zero?

Through the dynamic model list shipped in Fazm 2.4.0 on April 20, 2026. Before 2.4.0 the available-model picker was a static array baked into the binary. After 2.4.0 the bridge subprocess emits the live list from the Claude agent SDK, and the Swift handler ingests it on the next session. So when Anthropic flipped Opus 4.7 to GA on April 22, 2026, Fazm picked it up the same day with no App Store push. Anyone who triggered a deep-research run that day was already routing the 8-phase pipeline through Opus 4.7. Fazm 2.4.2 on April 26 then shipped a seven-line Swift function that silently migrated the persisted 'opus' UserDefaults string to the new ACP alias 'default', so the floating bar pill kept its Smart label. Two patches in six days, both visible in CHANGELOG.json, both transparent to the user.

What does a deep-research run for 'AI news last 24 hours' actually return that I cannot get from a chatbot?

Verifiable structure. A chatbot answer collapses everything into a single paragraph, optionally cites two URLs, and never resolves them. A standard-mode deep-research run returns 4,000+ words built from 15 to 30 sources, every factual claim grounded against a numbered citation in a bibliography that has been DOI-resolved. The skill enforces a Claims table and a Counterevidence section. It uses the explicit 'According to [1]...' marker for source-grounded statements and 'This suggests...' for inference. It places key findings at the start and end of each section to avoid the loss-in-the-middle problem documented in long-context literature. The output for the day's model releases, new papers, and open source drops is a real research report you can hand to a colleague, not a paragraph that decays the moment you click away.

What is the parallel-execution rule the skill enforces, and why does it matter for a 24-hour news topic?

The skill is explicit: 'Phase 3 RETRIEVE - Mandatory Parallel Search'. It tells the agent to decompose the query into 5 to 10 independent search angles before issuing any searches, then launch all of them in a single message with multiple tool calls, then spawn 3 to 5 parallel agents using the Task tool for deep-dive investigations. It even shows the wrong pattern: 'WebSearch #1 -> wait for results -> WebSearch #2 -> wait -> WebSearch #3' marked as ❌ WRONG. For a 24-hour news topic the parallelism is the difference between a five-minute report and a thirty-minute report; on a daily news cadence that gap is the difference between useful and stale.

How does Fazm know which mode (quick, standard, deep, ultradeep) to run when I ask about today's AI news?

The skill ships a decision tree at the top: 'Trend query -> Assume recent 1-2 years unless specified', 'Standard mode is default for most queries'. For 'what is new in AI in the last 24 hours' the agent defaults to Standard (5 to 10 minutes, 15 to 30 sources, 4,000+ words). If you say 'quick rundown' or 'briefly' the skill switches to Quick (2 to 5 minutes, 2,000+ words, 3 phases instead of 8). If you say 'critical decision' or 'compare in depth' the skill switches to Deep (10 to 20 minutes, 6,000+ words, 8 phases including Critique and Refine). UltraDeep is reserved for explicit asks of 'comprehensive review' (20 to 45 minutes, 10,000+ words, no upper bound). The mode is announced before execution, with the line 'Starting standard mode research (5-10 min, 15-30 sources)'.

What is the actual install footprint when Fazm seeds the deep-research skill on first launch?

On first launch the app checks ~/.claude/skills/ exists, creates it if not, then iterates the bundled .skill.md files. Each one becomes a directory: ~/.claude/skills/deep-research/SKILL.md, ~/.claude/skills/web-scraping/SKILL.md, and so on. The reference subfolders ('reference/methodology.md', 'templates/report_template.md', 'templates/mckinsey_report_template.html', 'scripts/verify_citations.py', 'scripts/validate_report.py') are written alongside SKILL.md when present in the bundle. The total seeded payload from BundledSkills is the 4,523 lines of skill content plus reference/template/script attachments. SkillInstaller.swift logs each install with 'installed' / 'updated' / 'skipped (unchanged)' and surfaces the summary in the onboarding screen under category headings: Personal, Productivity, Documents, Creation, Research & Planning, Social Media, Discovery. deep-research falls under Research & Planning along with travel-planner and web-scraping.