A sorter, not a listicle

On-device AI by what you need: four categories that don't overlap

Every "best on-device AI" list I have hit lately is a mix of four different products doing four different jobs, served as a single ranked list. If you arrive at one of those thinking "I want my Mac to do my work," you bounce off a chat app. Sorting by what you actually want done fixes that. Here are the rows, what each one actually is, and which named tools belong in each.

Matthew Diakonov, Written with AI

Published May 12, 20266 min read

Direct answer (verified 2026-05-12)

There are four kinds of on-device AI, and they solve different problems. If the verb is ask, pick a local chat app (Locally AI, Enclave AI, LM Studio, Ollama). If the verb is transcribe, pick a speech-to-text tool (WhisperKit, Parakeet, Superwhisper, EmberType). If the verb is do, send, finish, fill, pick a computer-use agent (Fazm; Perplexity Personal Computer launched 2026-05-07). If the verb is remember, pick an on-disk personal context layer (Apple Intelligence Personal Context, Fazm ai_user_profiles). These rows do not overlap, and the "best of" lists that mash them together are why you can buy the wrong thing.

The four rows, side by side

Each card below names what the row actually is at the system level, which tools fit it, and the verb in your sentence that should point you there. Rows 3 and 4 are accented because they are where Fazm lives; the other two are honestly served by different tools and I will not pretend otherwise.

1. Local chat with a model

An LLM you can talk to with no network. The model lives on disk, inference runs on the Neural Engine or GPU, the answer comes back in a window. Stops at the window.

Tools that fit

Locally AI, Enclave AI, LM Studio, Ollama, On Device AI, Jan

Verb

ask

2. Voice transcription

Audio in, text out. A local speech-to-text engine, sometimes with diarization. Never acts on the text it produces. Pair it with a row 1 chat app or a row 3 agent if you want the words to do something.

Tools that fit

WhisperKit, Parakeet (NVIDIA, MLX port), Superwhisper, EmberType, On Device AI (STT mode)

Verb

transcribe

3. Computer-use agent

Reads the accessibility tree of whatever app is open. Clicks, types, scrolls, and presses keys inside real Mac apps. The model can be local or cloud. The actions stay on your Mac.

Tools that fit

Fazm (open source, MIT, github.com/m13v/fazm), Perplexity Personal Computer (launched 2026-05-07, bundled with their Mac client)

Verb

do, send, finish, fill

How Fazm does it

9 action tools wrap AXUIElement: open, click, type, pressKey, scroll, setValue, pressAX, setSelected, refresh. Each call writes a text file where every row is the literal string [Role] "text" x:N y:N w:W h:H visible. Defined in mcp-server-macos-use/Sources/MCPServer/main.swift around lines 1482 and 1505.

4. On-disk personal context

An indexed view of you that other prompts read from. A profile of your files, languages, projects, installed apps, calendar. Lives in a local store. Used by an agent or a chat as the upstream context for every turn.

Tools that fit

Apple Intelligence Personal Context (Mail, Calendar, Messages, Notes, Photos surface, rolling out through 2026), Fazm ai_user_profiles (file system and installed apps surface)

Verb

remember, know about me

How Fazm does it

Four-table SQLite schema (ai_user_profiles, indexed_files, local_kg_nodes, local_kg_edges) at Desktop/Sources/AppDatabase.swift lines 896 through 928. An AI-generated paragraph summary is wrapped in <ai_user_profile> tags and concatenated into every chat turn before the model runs.

Why row 1 and row 3 are not the same product

This is the confusion that costs the most. A local chat app and a computer-use agent both run on your Mac, both can run a model locally, both have a text interface. They are still different products with different surfaces. The interactive toggle below shows what each one is actually wired into. Worth a second look before installing the wrong one.

The system surface a local chat app sees vs the surface a computer-use agent sees

A window. A model. A conversation log. The app talks to itself.

No system permissions beyond microphone (if it has voice in)
Cannot read the accessibility tree of other apps
Cannot click, type, or scroll inside Mail, Calendar, Slack, Notion
Cannot index your file system
Useful for offline Q and A, code help, brainstorming, translation

If you already have a clear picture of which row you want, skip the rest. If you do not, the next two sections give you the cheapest test that will tell you.

A 60-second test for which row you actually need

Write out the next three tasks you want an AI to do for you on your Mac. Verbatim, like a sentence. Then circle the verb in each sentence. The verb is the row.

Match the verb to the row

ask, answer, explain, summarise, translate -> row 1, local chat
transcribe, dictate, caption, record, diarize -> row 2, voice transcription
do, send, file, fill, schedule, post, edit, click, drag, automate -> row 3, computer-use agent
remember, recognise, know about me, personalise, recall -> row 4, on-disk personal context

If two rows show up across your three sentences, you need two tools. Most people who type "on-device AI" want row 3 with row 4 as the upstream context, then row 2 for hands-free input. Row 1 alone is not enough and never was.

Where Fazm sits, in concrete file paths

Fazm covers rows 3 and 4 in one app. I would rather show you the files than make you take my word for it.

Row 3, computer-use agent

mcp-server-macos-use/Sources/MCPServer/main.swift:1482

// let allTools = [openAppTool, clickTool, typeTool, pressKeyTool,

// scrollTool, setValueTool, pressAXTool,

// setSelectedTool, refreshTool]

mcp-server-macos-use/Sources/MCPServer/main.swift:1505

// Each line in the .txt file has the format:

// [Role] "text" x:N y:N w:W h:H visible

Row 4, on-disk personal context

Desktop/Sources/AppDatabase.swift:896-928

// ai_user_profiles, indexed_files,

// local_kg_nodes, local_kg_edges

Routing, so the agent never confuses a desktop app with the browser

Desktop/Sources/Chat/ChatPrompts.swift:101

// - Desktop apps: macos-use tools (mcp__macos-use__*)

// - Browser: playwright tools ONLY for web pages inside Chrome

Everything in that block is in the open repo at github.com/m13v/fazm. Open the files, follow the references. The reason the sort is honest is that you can verify each row against a real file.

Strict on-device vs loose on-device, since it always comes up

Strict on-device means the model weights are on disk and inference runs locally. Loose on-device means the screen, microphone, file system, and personal data stay on the Mac, but the model output may come from a cloud endpoint. Row 1 chat apps are strict by default. Row 2 transcription is strict by default. Row 4 context is strict (the index is local) regardless of where the consuming chat runs.

Row 3 is the one with a choice. Fazm in default config is loose: local screen, local actions, local profile, cloud model. The one field that flips it strict is Settings > AI Chat > Custom API Endpoint pointed at a local Anthropic-Messages bridge in front of an MLX model. The agent loop, the 9 tools, the AX tree, the personal profile do not change. Only the URL changes. That is on purpose, because for most consumer tasks the latency of a frontier cloud model is better than the latency of a 13B local one, and the sensitive surface (your screen, your files) is already local either way.

Trying to figure out which row you actually need?

If you want to walk through your three sentences with me and pick the right thing to install, I sit on calls for this. Faster than another listicle.

Frequently asked questions

Why is 'on-device AI' not one category?

Because four different system surfaces are all marketed under the same banner, and they do not overlap. A chat app like Locally AI or Enclave AI runs a model locally and answers questions in a window. It never touches Finder or Mail. A transcription tool like WhisperKit or Parakeet turns voice into text and stops there. A computer-use agent like Fazm reads the macOS accessibility tree of whatever app is open and clicks, types, scrolls inside it; the model can live on-device or in the cloud, the actions stay local. A personal context layer like Fazm's ai_user_profiles or Apple Intelligence's Personal Context indexes your files and identity into a local store that other prompts read from. A reader who needs the third one will not be helped by buying the first one, and most listicles do not draw the line.

What do I pick if I want an LLM I can talk to with no internet?

A local chat client. Locally AI and Enclave AI are the friendliest macOS apps; both ship a curated model list, both run on Apple Silicon, both have nothing to configure. LM Studio gives you the most control and a model browser tied to Hugging Face. Ollama is the CLI plus a thin GUI; if you already live in a terminal it is the cheapest path. On Device AI ships 170+ Llama, Gemma, Qwen, DeepSeek, Phi, and Mistral checkpoints across iPhone, iPad, Mac, and Vision Pro via GGUF and MLX. Any of those answers 'chat offline' fully. None of them clicks a button in Mail, edits a row in your CRM, or reads the file system to know what kind of work you do; that is the next row down.

What do I pick if I want voice in and clean text out?

A transcription tool, not a chat app. WhisperKit is Apple's on-device port of Whisper, used inside several Mac apps. Parakeet (NVIDIA, MLX port available) is faster than Whisper for English on Apple Silicon and is the right pick when you want low-latency dictation. Superwhisper and EmberType wrap one of those engines into a system-wide dictation experience with a hotkey, a menu bar app, and post-processing. None of them act on the text they produce. Pair one of them with a chat app or with a computer-use agent if you want the transcribed text to actually do something.

What do I pick if I want the AI to actually click around inside my real Mac apps?

A computer-use agent. The two consumer-facing options on macOS today are Fazm and Perplexity Personal Computer. They take different paths. Fazm bundles a Swift MCP server that wraps Apple's AXUIElement accessibility APIs and exposes exactly 9 action tools, each returning a text file where every row is '[Role] "text" x:N y:N w:W h:H visible'. The model grep's that text for the button it wants and passes the bounding box straight back. Perplexity Personal Computer is closer to a chat-driven action layer over your active app. Fazm is consumer software you install from a DMG; you grant Accessibility once and run it. Perplexity Personal Computer launched May 7 2026 and is bundled with their broader Mac client. If you want something you can read the source of and run today with a local model behind it, Fazm is the more direct fit.

What do I pick if I want the model to know who I am and what I work on without me re-explaining every chat?

A personal context layer. There are two real surfaces here today. Apple Intelligence's Personal Context, rolling out gradually through 2026, ties into Apple's own indices: Mail, Calendar, Messages, Notes, Photos. It is correct for OS-level Siri queries. Fazm's local profile takes a different cut: an AI-generated paragraph profile of you written to a four-table SQLite schema (ai_user_profiles, indexed_files, local_kg_nodes, local_kg_edges) at first launch, by running 5 to 12 SELECT queries against an index of your home folder and installed apps. The profile gets wrapped in <ai_user_profile> tags and pasted into every chat turn before the model sees the message. If you want an agent that already knows the languages you code in by extension count, the apps you have installed, and the projects at the top of ~/, that is the row you need. The data is local. Whether the inference is local depends on the endpoint you point the agent at.

Are these four categories really mutually exclusive?

The categories are mutually exclusive in what the software does, not in what you might want to install. A realistic stack for a Mac user who is serious about local AI is a chat app for casual offline questions, a transcription tool for dictation, a computer-use agent for the actual work, and a personal context layer underneath the agent. Fazm covers the last two of those in one app and lets you point the model layer at any Anthropic-Messages-compatible endpoint, including a local MLX bridge. The other rows are still better filled by tools built for them. A computer-use agent is the wrong place to ask a casual question; an offline chat app is the wrong place to expect actions.

Does 'on-device' mean the model is also local, or just that nothing leaves the Mac?

Most people who type the phrase mean one of two things and confuse them. Strict on-device means the weights are on disk, inference runs on the Neural Engine or GPU, no network round-trip. That is what a chat app like Locally AI guarantees by default. Loose on-device means the screen, microphone, file system, and personal data stay on the Mac, but the model output may come from the cloud. Fazm in its default config is the second one: the agent loop, the accessibility tree, the SQLite store, the 9 macos-use action tools, and the personal profile are all local. The model is configurable. There is a single field, Custom API Endpoint in Settings > AI Chat, that you can point at any Anthropic-Messages-shaped server, including a local MLX bridge on 127.0.0.1. Sorting by strictness matters because most products in the third row work the loose way, and most products in the first row only work the strict way.

Where do I look at Fazm to verify any of this?

Three open files in github.com/m13v/fazm. Sources/MCPServer/main.swift around line 1482 of the mcp-server-macos-use submodule, which is where the 9 action tools (openAppTool, clickTool, typeTool, pressKeyTool, scrollTool, setValueTool, pressAXTool, setSelectedTool, refreshTool) are aggregated into the tool list the model sees, and line 1505 which documents the row format '[Role] "text" x:N y:N w:W h:H visible'. Desktop/Sources/AppDatabase.swift lines 896 through 928 for the four-table personal-context schema. Desktop/Sources/Chat/ChatPrompts.swift line 101 for the tool-routing rule that hands desktop apps to macos-use, the browser to playwright, and never confuses them.

If I only install one thing, what should it be?

Pick by the verb. If the verb in your sentence is 'ask', install a chat app. If the verb is 'transcribe', install a dictation tool. If the verb is 'do' or 'finish' or 'send' or 'fill', install a computer-use agent. If you do not know which verb yet, the computer-use agent is the row that subsumes the others later, because once you have an agent that can drive arbitrary apps, you can drive a transcription app or a local chat app from it. The other direction does not work: a chat app cannot grow into an agent without rewriting itself.

If you want to go deeper into rows 3 and 4 or the strict-vs-loose split, these are the three pieces to pick up next.

Three rows, three reads

Row 4 deep dive

Personal AI agent on device: the four-table local schema that makes it real

What 'personal AI on device' actually requires once you stop hand-waving: local ingestion, a local profile, and prompt injection that never leaves the machine. Three files in the open Fazm source, line numbers included.

Read

Row 3 deep dive

Local AI app for Mac that does not just chat

Why the row 3 surface (computer-use agent) needs a bundled MCP server and an accessibility-tree representation, not a chat window. The Swift binary, the six AX action tools, and the exact text format the model reads on every turn.

Read

Strict vs loose on-device

Local MLX model for desktop loops: the one settings field

How a row 3 agent goes from loose on-device (local actions, cloud model) to strict on-device (local actions, local model). One Settings field, one env var, the rest unchanged. Bridges, sizes, what breaks first.

Read

The four rows, side by side

1. Local chat with a model

2. Voice transcription

3. Computer-use agent

4. On-disk personal context

Why row 1 and row 3 are not the same product

The system surface a local chat app sees vs the surface a computer-use agent sees

A 60-second test for which row you actually need

Where Fazm sits, in concrete file paths

Strict on-device vs loose on-device, since it always comes up

Trying to figure out which row you actually need?

Frequently asked questions

Three rows, three reads

Personal AI agent on device: the four-table local schema that makes it real

Local AI app for Mac that does not just chat

Local MLX model for desktop loops: the one settings field

Comments (••)

Comments ()