A local AI assistant is about where your context lives, not where your model runs
Every other guide on this collapses the word local into a single meaning: the model weights sit on your hard drive. That framing leaves the important part out. Fazm keeps your entire long-term memory, the file index, the knowledge graph, every past conversation, and your user profile in one SQLite file at ~/Library/Application Support/Fazm/users/{userId}/fazm.db, and the agent reads from it with a tool that takes one argument: a SELECT statement.
THE UNSTATED ASSUMPTION
Two meanings of "local" are getting smashed together
Almost every article about this topic takes one shape. It opens with the privacy argument, lists Ollama, Jan.ai, LM Studio, and GPT4All, explains how to download a 7B or 13B model, and tells you to ask it questions. That is a local inference runtime. It is useful. It is also not what most people mean when they sit down and think they want a local AI assistant.
What people actually want is an assistant that remembers them, reads their files, opens their apps, and does not hand a shadow copy of their life to a vendor. Only one of those four properties is about where the model runs. The other three are about where the context lives, which is a storage question, not an inference question.
Fazm separates the two. The model answering your question today is a capable cloud model, because the best model at any given moment is almost always a cloud model. The context the model sees is a single file on your disk that you can open in the sqlite3 CLI and read yourself. When a future local model becomes good enough to swap in, the agent loop and the on-disk layout do not change. Only the backend does.
TWO AXES, NOT ONE
The 2x2 the common advice misses
Cloud model + cloud context
The default ChatGPT or Gemini experience. The model is cloud, the vendor keeps a shadow copy of your uploaded files, drive connections, and conversation history in their storage. No local surface.
Local model + cloud context
Rare and unhelpful. Nobody does this on purpose. You get the downsides of running a smaller model without any of the privacy upside because your files still got synced somewhere.
Local model + local context
The default picture of a local AI assistant. Ollama + a downloaded model, maybe wired to a note-taking plugin. Private, offline, usually too weak for real tasks, and has no path to your actual apps.
Cloud model + local context
Fazm. The model is cloud and capable. Your long-term memory, file index, and conversation history live in one SQLite file on your Mac. The agent reads them with execute_sql. Swappable to a local model later without changing the on-disk layout.
The fourth quadrant is the one that matches what most people actually want, and it is the one the common advice never names.
THE ANCHOR FACT
One file, seven tables, one query tool
The assistant's entire long-term memory of you is a single SQLite file. You can find it right now if Fazm is installed:
The seven tables the agent actually reads from are indexed_files, local_kg_nodes, local_kg_edges, chat_messages, chat_messages_fts, observer_activity, and ai_user_profiles. The others you saw in the output (chat_messages_fts_data, _idx, _docsize) are FTS5 internals.
THE ARCHITECTURE
How one English sentence becomes one SELECT
Agent loop · local context layer
The shape is deliberately boring. The model writes SQL, the pool runs it, the pool hands back rows. No RPC to a remote index, no embedding service, no network call.
THE TOOL
execute_sql is forty lines long
Here is the read path, abridged from Desktop/Sources/Providers/ChatToolExecutor.swift (starting around line 264). The whole API the model has against your personal data layer fits on one screen.
A few things worth noticing. First, the model writes real SQL, not a JSON query DSL. Second, the auto-appended LIMIT 200 is a floor, not a ceiling; the model is free to ask for less. Third, the return format is plain text with | separators, which is exactly what the model is already good at reading (it looks like a markdown table or a psql output). Fourth, anything that is not SELECT / INSERT / UPDATE / DELETE is rejected before the pool ever sees it.
WHAT AN ACTUAL QUERY LOOKS LIKE
Real SELECTs the agent writes against your disk
These are verbatim from the prompt in Desktop/Sources/Chat/ChatPrompts.swift (starting around line 446). The prompt teaches the model the shape of the tables, then lets it improvise.
GROUP BY fileType ORDER BY count DESC LIMIT 15
WHERE fileType = 'code'
GROUP BY fileExtension ORDER BY count DESC LIMIT 20
WHERE filename IN ('package.json', 'Cargo.toml',
'Podfile', 'go.mod', 'requirements.txt',
'pyproject.toml', 'Package.swift', 'Dockerfile')
LIMIT 40
FROM indexed_files
ORDER BY modifiedAt DESC LIMIT 20
WHERE chat_messages_fts MATCH 'refund policy'
ORDER BY rank LIMIT 10
None of these calls go through an embedding service, a vector index, or a third-party API. They all run against the file at ~/Library/Application Support/Fazm/users/{userId}/fazm.db and return in milliseconds.
THE NUMBERS
The few numbers that matter, read from source
The 500 MB cap lives in FileIndexerService.swift line 24 as maxFileSize. The depth-3 rule is on line 21. The auto-LIMIT 200 is in ChatToolExecutor.swift line 273. The seven reading tables are enumerated in ChatPrompts.swift lines 446 to 456 and 509 to 519. Nothing on this page is a guess.
Local context (Fazm) vs the usual local-LLM stack
| Feature | Ollama / Jan.ai / LM Studio | Fazm |
|---|---|---|
| Where the model runs | On your hardware | Cloud (swappable to local) |
| Where your file index lives | Not provided | ~/Library/Application Support/Fazm/users/{userId}/fazm.db |
| Where your chat history lives | Local JSON or nothing | chat_messages + chat_messages_fts (FTS5) |
| Personal knowledge graph | Not provided | local_kg_nodes, local_kg_edges tables |
| Agent query API | Chat prompt only | execute_sql tool, SELECT/INSERT/UPDATE/DELETE |
| Can drive other Mac apps | No, chat window only | Yes, via macos-use MCP (AX tree) |
| Vendor vector store / shadow copy | None (that is the selling point) | None |
| Open the data yourself | Varies by app | sqlite3 fazm.db |
HOW THE FILE GOT THERE
The path from first launch to your first query
You install Fazm and sign in
The app creates ~/Library/Application Support/Fazm/users/{firebaseUid}/ and opens a fresh fazm.db with a GRDB DatabasePool in WAL mode. Sidecar files fazm.db-wal and fazm.db-shm appear next to it, along with a .fazm_running flag so the next launch can detect unclean shutdowns.
Migrations run
GRDB walks registered migrations in order: initial tables, then fazmV3 (task_chat_messages -> chat_messages), fazmV4 (observer_activity), fazmV5 (add session_id, create chat_messages_fts virtual table, wire up insert/update/delete triggers). Your file ends up on the current schema even if you installed an older build last month.
The indexer scans your folders
FileIndexerService walks ~/Downloads, ~/Documents, ~/Desktop, ~/Developer, ~/Projects, ~/Code, ~/src, ~/repos, ~/Sites, /Applications, and ~/Applications up to depth 3. It skips .Trash, node_modules, .git, __pycache__, .venv, venv, .cache, .npm, .yarn, Pods, DerivedData, .build, build, dist, .next, .nuxt, target, vendor, Library, .local, .cargo, and .rustup because scanning those produces mostly noise. Files over 500 MB are not recorded. Results batch-insert in groups of 500.
You ask something
The agent loop decides to call execute_sql with a specific SELECT. The DatabasePool runs it against your local file, rows come back as plain text with pipe separators, and the model uses that output to answer. Everything up to the final answer happens on your Mac.
The knowledge graph gets written
When you tell Fazm something new, it calls save_knowledge_graph with nodes and edges. Both go into local_kg_nodes and local_kg_edges in one transaction. Next time you ask a related question, the agent includes graph edges as part of the SELECT context.
CHECKLIST
What 'local' should mean, on this reading
- Your file index sits in a file you can open yourself
- Your chat history is searchable offline with an FTS5 index
- Your knowledge graph is nodes and edges in SQL, not a cloud graph service
- The agent's only path to your data is a named tool whose calls you can log
- Swapping to a local model later does not change the on-disk layout
- You can back the whole thing up with cp, or inspect it with sqlite3
ABOUT THE 0 TABLES
What each table is for, in one line
A REAL SESSION
Inspect the database yourself
Nothing about this file is hidden from you. If Fazm is installed, you can open a terminal and ask the same questions the agent asks. Quit the app first so the WAL gets checkpointed cleanly; otherwise you will see slightly older numbers.
The counts above are illustrative. Your own file will show your own numbers. If they surprise you, it is the first time your file system has looked back at you.
WHERE TO GO NEXT
Related reading on this site
Keep reading
Local AI app for Mac, beyond chat wrappers
The macOS-automation side of Fazm. Instead of screenshots, the agent reads the accessibility tree of every running app.
AI browser automation you can actually see
How Fazm injects a visible overlay into your real Chrome so the agent never drives a tab without the user knowing.
Run vLLM locally on Mac with an AI agent
The inference half of the local-AI-assistant picture. What it takes to swap in a local model end to end.
Want the local context layer, not just a local model?
Book a 20-minute call and we will walk through the SQLite layout on your own Mac, show what is indexed, and wire it to your workflow.
Questions this framing tends to raise
What does 'local AI assistant' actually mean if the model is still running in the cloud?
Most guides to local AI assistants conflate two very different things: where the model weights run (the inference layer) and where your personal data lives (the context layer). Fazm treats them as independent. The model that writes the chain of tool calls can be whichever one is best today, but the context the model reads from is a SQLite file on your disk at ~/Library/Application Support/Fazm/users/{userId}/fazm.db. Your file index, your conversation history, your knowledge graph, your onboarding notes, and your user profile never get uploaded to a vendor vector store. The agent queries them in place with SELECT statements. That separation is the part of 'local' that actually affects your privacy and your latency.
Where is the database on my Mac, and who opens it?
The path is ~/Library/Application Support/Fazm/users/{userId}/fazm.db, built in Desktop/Sources/AppDatabase.swift by a function called userBaseDirectory (lines 238 to 248). The file is opened as a GRDB DatabasePool with WAL mode enabled, which is why you also see fazm.db-wal and fazm.db-shm sidecar files next to it. A per-user flag file called .fazm_running tracks unclean shutdowns so the next launch can run an integrity check if the previous session did not mark a clean shutdown. No other process touches the file; the Mac app itself is the only writer.
What exactly is in the database? Which tables?
Seven tables matter for day-to-day context. indexed_files stores metadata for every file the scanner visits under ~/Downloads, ~/Documents, ~/Desktop, ~/Developer, ~/Projects, ~/Code, ~/src, ~/repos, ~/Sites, and /Applications, up to depth 3, skipping files larger than 500 MB and about twenty common build folders like node_modules, .git, __pycache__, DerivedData, and target. local_kg_nodes and local_kg_edges hold a personal knowledge graph (concept nodes, typed edges, source file ids). chat_messages stores every conversation with the assistant, with chat_messages_fts shadowing it as an FTS5 virtual table so full-text search is O(log n) instead of a full scan. observer_activity tracks background insight cards and auto-drafted skills. ai_user_profiles is a writable profile the agent updates on its own. Each table has a short description in ChatPrompts.swift that the model reads at query time so it can write correct SQL.
How does the assistant actually pull data out of the file? Is there an API in front of it?
No. There is one tool called execute_sql, defined in Desktop/Sources/Providers/ChatToolExecutor.swift, that accepts a raw SQL string. The tool is restricted to SELECT, INSERT, UPDATE, and DELETE (anything else returns 'only SELECT, INSERT, UPDATE, DELETE statements are allowed'). If the SELECT query has no LIMIT clause, the executor auto-appends LIMIT 200 to prevent the model from pulling half your disk into its context window. Results come back as a plain text table with ' | ' separators, truncated at 500 characters per cell. That simplicity is deliberate: it means the model talks to your data the same way a human would in a psql prompt, and you can read what it did by tailing the app log.
How is this different from Ollama, Jan.ai, LM Studio, or GPT4All?
Those are inference runtimes. They give you a chat UI and a local model. They do not know anything about your files, your calendar, or what you were doing in Finder an hour ago, and they cannot do anything in an app on your Mac. Fazm is the other half of the stack: it keeps your context on your Mac in one SQL-queryable file and drives the apps that already have that context open. You can point Fazm at a cloud model or, in principle, at a local model over the same agent loop; the on-disk layout does not change. The usual 'local vs cloud' framing treats these as competitors; in practice they solve different problems.
Why SQLite specifically? Why not a vector database?
Because the agent can write the query. Vector databases are excellent when the consumer of the data is another program running similarity search; they are awkward when the consumer is a language model that can write exact SQL. A single SQLite file works with the same mental model the model already has from reading millions of lines of application code, supports full-text search via the FTS5 module for free (see the chat_messages_fts virtual table and its three triggers in the fazmV5 migration), runs offline with zero setup, lives on one disk, and can be backed up with cp. Embedding search is still available where it helps, but flat SQL is the default because it is the most reliable way to ask a precise question about your own stuff.
Does that mean Fazm is slower than a RAG-style assistant?
The opposite. Retrieval-augmented chatbots spend most of their time turning your question into an embedding, comparing it against a remote index, and streaming back the top k chunks. Fazm just runs a SELECT on a local file. For a question like 'show me my Swift projects,' the path is: execute_sql → SELECT fileExtension, COUNT(*) FROM indexed_files WHERE fileType = 'code' GROUP BY fileExtension ORDER BY count DESC → text table back. That entire round trip is a single function call on local storage, not a network round trip to a vector service.
What stops the model from deleting my data with a bad DELETE query?
The executor allows writes but logs every one (see the log line 'Tool execute_sql write: \(changes) row(s) affected' at line 330 of ChatToolExecutor.swift), and the tables the model is encouraged to write to are narrow: ai_user_profiles for profile updates, and the knowledge graph via a separate save_knowledge_graph tool that batches nodes and edges into a single transaction. The core chat_messages and indexed_files tables are read almost exclusively. The guardrail is that the model knows what the table is for (every table has a human-readable description attached to the prompt) and that the tool name makes the operation legible in the tool-call log. There is no silent 'clear memory' path.
How does the file index actually get populated?
There is an actor called FileIndexerService in Desktop/Sources/FileIndexing/FileIndexerService.swift. On boot and on an incremental schedule, it walks a fixed list of folders (~/Downloads, ~/Documents, ~/Desktop, ~/Developer, ~/Projects, ~/Code, ~/src, ~/repos, ~/Sites, /Applications, and ~/Applications), up to a max depth of 3, batching 500 records at a time into indexed_files. It skips folders like .Trash, node_modules, .git, .venv, DerivedData, and target because scanning those produces mostly noise. Package extensions like .app, .framework, .xcodeproj, and .xcworkspace are treated as leaves, not walked into. It skips files larger than 500 MB. The scanner does not read file contents; it only stores metadata (path, name, extension, size, folder, depth, timestamps).
What stays on my Mac and what leaves?
The SQLite file never leaves. The accessibility tree the agent reads from other apps (from the macos-use MCP server) never leaves as bulk data. What leaves is exactly what the model needs to decide the next action: the current question, whatever rows came back from your execute_sql call, and whatever part of the AX tree the agent chose to quote. That is a fundamentally different shape from cloud assistants that index your Drive, your Gmail, or your calendar into their own vector store and keep a shadow copy. Here the shadow copy is the SQLite file, and it is in your home directory, owned by you.
Can I read the database myself with the standard sqlite3 CLI?
Yes. Open Terminal, run sqlite3 ~/Library/Application\ Support/Fazm/users/{userId}/fazm.db, then .tables to see the list, and e.g. SELECT fileType, COUNT(*) FROM indexed_files GROUP BY fileType to see what it knows about your disk. If you want to export it, .dump or .backup work. Quit the Fazm app first so the WAL is checkpointed cleanly. The schema is identical to what the agent sees, which is the whole point: the assistant has no private view of your data.
How does Fazm find the right userId directory?
When you sign in, the Firebase UID becomes the directory name. Before sign-in, Fazm writes to the subdirectory 'anonymous'. A migration step in AppDatabase.swift (migrateFromLegacyUserDirectory) moves the legacy anonymous or device-UUID database into the correct per-user folder once you authenticate, so your indexed files and conversation history follow the account rather than the device session. If you switch users on the same machine, the code calls switchUser(to:), which closes the pool, reconfigures for the new userId, and reopens a fresh database file.