An AI picks up the phone at 2am. The lead still rots in your inbox by Monday.

Voice agents for small business after-hours calls are now a real category. The good ones answer your line at 2am, hold a polite conversation, and email you a clean transcript. They also stop the moment the caller hangs up. The desk-side work, the part where someone has to actually get the lead into HubSpot and reply by 9am, is still on you. This page covers what the phone-side category does today, where the gap is, and how a Mac-side voice agent (Fazm) closes the desk-side half on a schedule.

M
Matthew Diakonov
11 min read

Direct answer (verified 2026-05-01)

A voice agent for after-hours calls is an AI phone receptionist that answers your business line outside working hours, holds a natural-language conversation, captures the caller's name and reason, and either books into your calendar or routes urgent calls. The current category leaders are Smith.ai (from $95/mo, $2.40 per call), RingCentral AI Receptionist ($59/mo with 100 included minutes, $0.50/min overage), ElevenLabs Agents ($0.08 to $0.12/min), and Synthflow ($0.13 to $0.24/min all-in). All four live on the phone side and stop when the call ends. Fazm is a separate category: a Mac-side voice agent that runs scheduled routines on your laptop to handle the desk-side work after the call (CRM log, follow-up drafts, morning digest).

The phone half: what the existing tools do well

I have used most of these in real businesses, both my own and clients. The 2026 generation of phone-side AI is genuinely good at three things, and you should buy one of them if you do not have one already.

One, picking up. Pick a vendor, give it your business name, services, hours, and a routing rule for urgent callers. From that day on, your line never goes to voicemail again. For a service business, the difference between a missed call and a captured lead at 2am is often four figures of recurring revenue. Two, talking. The voice quality from ElevenLabs and the conversation quality from Smith.ai are both at the level where an unprompted caller does not usually realise it is AI. Three, structured capture. The transcript that lands in your inbox is not a wall of text; it is a labelled note with caller name, callback number, reason, urgency, and any booking they made. That is the artefact that downstream automation can use.

$0/moSmith.ai entry plan
$0/moRingCentral AI Receptionist
$0/minElevenLabs Agents (mid tier)
$0/minSynthflow all-in (typical)

The numbers above were pulled from each vendor's pricing page on 2026-05-01: smith.ai/pricing/ai-receptionist, ringcentral.com/pricing/ai-receptionist.html, elevenlabs.io/pricing, synthflow.ai/pricing. They will shift; treat them as a snapshot of the current shape, not a quote.

The half nobody markets: what happens after the call ends

Pretend the phone half works perfectly. Friday night, Saturday, Sunday. By Monday at 7am you have 14 transcripts in transcripts@yourbusiness.com. Five are spam, three are existing customers asking for support, four are warm leads, two are time-sensitive (one is a callback request, one is a complaint). Now what.

Most small business owners I know do the same thing. They open the inbox at 7:30 with their first coffee, spend 40 minutes triaging, copy the four leads into the CRM by hand, send two replies, schedule one callback, and forget about the complaint until 11am when the customer follows up with a much angrier email. By lunchtime the after-hours coverage has cost you 90 minutes of your most expensive hours, plus the goodwill from the complaint that sat for four hours.

The phone agent did not fail. It did exactly what it was sold as. The failure is the absence of the desk-side counterpart. Nobody sells the desk-side counterpart, because the people building phone agents do not work on Macs all day and the people building desktop agents are usually building developer tools. The category sits empty.

The desk-side gap

The phone agent did not fail. It did exactly what it was sold as. The failure is the absence of the desk-side counterpart.

What this page is about

Where Fazm slots in: a routine that fires at 6:55am

Fazm is a voice agent for macOS. You talk to it, it drives the apps you actually use through the accessibility APIs of those apps, and it can save what it does as a routine. A routine is a saved prompt plus a schedule. The schema and the runner are both in the public repo at github.com/m13v/fazm. If you want to verify any specific claim on this page, the four files are listed at the bottom.

The routines table was added in migration fazmV7 in Desktop/Sources/AppDatabase.swift line 1033. Two tables. cron_jobs holds the definitions. cron_runs holds one row per fire with status, output, cost, tokens, and duration. A schedule string is one of three formats. The full schema follows.

Desktop/Sources/AppDatabase.swift line 1033

You do not write any of this by hand. You create a routine by talking to Fazm: "every weekday at 6:55am, read last night's call transcripts from my Gmail label after-hours, dedupe them against my HubSpot, draft a short reply for each warm lead in Gmail, and send me a one-paragraph summary in iMessage." Fazm parses that into a cron_jobs row with prompt set to the work, schedule set to "cron:55 6 * * 1-5", and timezone set to your local timezone. From that point on, the launchd-driven runner at acp-bridge/src/cron-runner.mjs picks the row up at the scheduled time, spawns the agent, lets it complete the work against your real apps, and writes a cron_runs row with the result.

One scheduled run, end to end

Inputs on the left are everything that has to be in place before the routine fires. The hub in the middle is the scheduled run itself. Outputs on the right are the actual artefacts the small business owner sees on Monday morning, in the apps where they already work.

6:55am Monday, no human in the loop

Phone agent transcripts
HubSpot contacts
Calendar
cron_jobs row
cron-runner.mjs
Drafts in Gmail
New HubSpot contacts
Calendar holds
iMessage digest

The whole loop runs against the apps you already pay for. No new integration contract. No vendor seat for HubSpot connector access. The agent reaches into HubSpot the same way you do, by clicking through it via the accessibility tree, which is also why it works with the random web-based scheduling tool your bookkeeper made you adopt.

Why this works on the apps a small business actually uses

Most of the after-hours desk work for a small business touches at least one of: a CRM that does not have a free public API (or has one that rate-limits below what you need), a calendar that is technically Google Calendar but in practice is the web UI you have open all day, an inbox with quirky filters and labels, and a niche industry tool with no developer relations team at all. A phone-side AI that integrates via vendor APIs hits a wall on this list. A desktop-side AI that drives the apps through accessibility does not.

The choice between accessibility APIs and screenshot scraping is not cosmetic. Accessibility tree elements are addressable by role and identifier; the agent clicks "the Submit button on the lead form" rather than "the pixels that look like a Submit button right now." That is the difference between a routine that still works on Monday after a vendor pushed a UI tweak over the weekend, and a routine that silently fails because the button moved 12 pixels and the screenshot model does not recognise it any more. For an unattended after-hours fire, that reliability tax compounds fast.

Fazm is open source on GitHub at github.com/m13v/fazm. The accessibility integration, the routines schema, and the runner are all in the repo. If a specific app of yours behaves badly, the route to fix it is reading the actual code rather than waiting on a roadmap ticket.

How to actually pick a phone agent and pair it with a desktop one

Honest order of operations, assuming you currently have no after-hours coverage at all.

  1. Pick a phone-side vendor based on call volume, not feature charts. Under 50 after-hours calls per month, RingCentral AI Receptionist at $59 with 100 included minutes is the cheapest path to coverage. Between 50 and 200, Smith.ai per-call pricing is usually better since you skip the minutes meter. Above that, look at ElevenLabs or Synthflow with a flat per-minute rate. None of the vendors will be honest with you about which case is theirs; do the math on your own call log.
  2. Pipe the transcripts into a single inbox label. Every vendor will email you the post-call note. Set up a Gmail filter to put them under one label, after-hours. The label is what your desktop-side routine will read from on Monday morning.
  3. Install Fazm and create one routine. Talk to Fazm: "every weekday at 6:55am, look at last night's emails under after-hours, dedupe against HubSpot, draft a Gmail reply for each warm lead, send me a one-paragraph summary in iMessage." That is one cron_jobs row. Watch one or two fires before trusting the drafts to be sent automatically. Most owners eventually let it auto-send the warmest replies and keep the marginal ones in Drafts.
  4. Add a second routine for weekends only when you have proof the first one works. The temptation is to schedule six routines on day one. Don't. Run the Monday-morning routine for two weeks, read the cron_runs rows, fix the prompt where it gets the wrong leads into HubSpot, then add Saturday at 10am. Then add a Friday-evening routine that prepares for the weekend. Then stop, because anything more is probably a worse use of agent time than your own.

A note on the local audio question

People ask whether an AI is listening to their phone calls. Yes, on the phone-vendor side. That is the feature. Smith.ai, RingCentral, ElevenLabs, Synthflow all process the audio in their cloud to answer the call; they could not function otherwise. Pick the vendor whose data terms you can live with, read the retention clause, and assume the audio is not ephemeral.

On the Fazm side, the calculus is different. The microphone in your Mac is for talking to the agent while you are at the laptop. Fazm's transcription defaults to Deepgram but can be pointed at a local WhisperKit pipeline if you do not want any audio leaving the machine. The agent itself routes through any Anthropic-compatible endpoint, including a fully local model server, so the desk-side routines can run without sending the call transcripts (or anything else about your business) to a third party at all. That is the local-leaning posture, not a marketing slogan; the code paths are in the repo.

Want to see one routine wired up against your phone-agent transcripts?

15 minutes on a call. We walk through your current after-hours flow and what a Monday-morning routine would actually do for your inbox.

FAQ

What is a voice agent for after-hours calls, in one sentence?

An AI phone receptionist that answers your business line outside of working hours, holds a natural-language conversation with the caller, captures their name, reason, and callback details, and either books an appointment, routes urgent calls, or drops a transcript into your inbox by morning. Common providers as of May 2026 are Smith.ai, RingCentral AI Receptionist, ElevenLabs Agents, Synthflow, and Quo (formerly OpenPhone Sona). They all live on the phone side. None of them does the desk-side work that follows the call.

What does this kind of voice agent typically cost a small business?

Per-call pricing on Smith.ai starts at $95/month with calls priced at $2.40 each over the first 50, dropping to $2.10 over 500. RingCentral AI Receptionist is $59/month for 100 included minutes with $0.50/min overage. ElevenLabs Agents bills $0.08 to $0.12/minute on usage, with bundles starting at the Creator plan. Synthflow is roughly $0.13 to $0.24/minute all-in once you include the voice engine, model, and telephony components. Pricing was checked on each vendor's pricing page on 2026-05-01; the linked URLs are listed below the FAQ.

What are these phone agents actually good at?

Three things. First, picking up the phone with a polite greeting that uses your business name and a few facts about your services. Second, holding a 30-second to 3-minute conversation that gets the caller's name, what they want, and a callback number, then writing that into a structured note. Third, booking into a connected calendar or escalating an urgent caller via SMS or a live transfer. They can do this around the clock with consistent voice quality and no scheduling cost. The category has matured in the last 18 months: 2024 demos felt like Siri reading a script, 2026 production agents handle interruptions and accents reliably enough that most callers do not realise they are talking to AI.

What do they not do?

They stop at the call. The transcript lands in your inbox or a vendor dashboard. From there, the work is on you, by Monday morning. You re-read 14 voicemails and 6 transcripts. You copy the lead into HubSpot. You draft a reply to the urgent one. You add the consultation to your calendar even though the agent technically already did, because it picked the wrong time zone. You write the daily summary you wish your inbox had handed you at 7am. Phone-side AI ends where your laptop begins, and the laptop is where the actual lost time is.

How does Fazm fit into this?

Fazm is not a phone receptionist. Fazm is a voice agent on your Mac that handles the desk-side half of the same problem. The relevant feature is routines, added in migration fazmV7 in Desktop/Sources/AppDatabase.swift line 1033 of the public repo. A routine is a saved prompt plus a schedule. You create one by talking: 'every weekday at 7am, read last night's call transcripts from my inbox, dedupe them, log the new ones into HubSpot via my browser, and write me a one-paragraph summary in iMessage.' The schedule string is one of three formats: cron:0 7 * * 1-5, every:1800, or at:2026-05-02T07:00:00Z. A launchd-driven runner at acp-bridge/src/cron-runner.mjs spawns the bridge, executes the routine, writes the result to cron_runs, and exits.

Why use accessibility APIs instead of just calling vendor APIs?

Most after-hours desk work touches apps that either do not have a public API, charge to use it, or rate-limit it below what a small business actually needs. HubSpot, Pipedrive, Google Calendar, Notion, Apple Notes, the random web CRM your contractor set up. A desktop voice agent that drives those apps through the same accessibility tree the user sees works on all of them with no integration contract. Fazm's accessibility-API approach is the difference between 'works only with the four CRMs we built integrations for' and 'works with whatever app you actually use.' That is also the difference from screenshot-based desktop agents, which pay a latency and reliability tax on every click and miss elements that scroll out of frame.

Can I use both? An after-hours phone agent and Fazm?

Yes. That is the intended pairing. The phone agent answers the call, captures the structured note, and emails it to you at, say, transcripts@yourbusiness.com. A Fazm routine fires at 6:55am Monday through Friday and does the desk-side cleanup: dedupe, CRM log, draft three follow-up emails for the highest-intent leads (sitting in Drafts, not sent), update the calendar, and write a one-paragraph digest. You wake up to a clean inbox and three drafts to review for 60 seconds each. The phone agent does what it is good at. Fazm does what it is good at. Neither one tries to do the other half badly.

Where can I read the routines source to verify any of this?

The migration is in Desktop/Sources/AppDatabase.swift starting at line 1033 (fazmV7), which creates cron_jobs and cron_runs with indices. The runner is acp-bridge/src/cron-runner.mjs. The schedule parser and three formats are in acp-bridge/src/schedule.mjs. The UI surface is Desktop/Sources/MainWindow/Pages/RoutinesSection.swift, which reads from the same SQLite tables in WAL mode. The repo is github.com/m13v/fazm. Open the four files, follow the references, and you can trace one scheduled run from launchd through the bridge to a cron_runs row in under twenty minutes.

Is the audio kept local? My after-hours calls are sometimes confidential.

On the phone-receptionist side, no. Smith.ai, RingCentral, ElevenLabs, Synthflow all process the call audio in their cloud; that is how they answer the phone in the first place. Pick the vendor whose data terms you can live with. On the Mac side, Fazm's voice control is local-leaning: speech-to-text uses Deepgram by default but can be pointed at a local WhisperKit pipeline, and the agent itself runs against any Anthropic-compatible endpoint, including a local model server. The desk-side routines that read your inbox and update your CRM run on your machine, so call transcripts the phone vendor sent you are processed locally even though the original call was not.

What if the routine fails on a Saturday at 7am, when nobody is watching?

Every fire writes a row to cron_runs whether the run succeeded or not, with status one of ok, error, timeout, or running, plus error_message, cost_usd, input_tokens, output_tokens, and duration_ms. The Routines tab in the app reads that table and shows a chronological timeline. If a run errors or times out, the next morning you see the failed row and the error string. Logs land in ~/fazm/inbox/skill/logs/routines.log on disk. There is no silent failure mode where a missed weekend digest just disappears.