Personal AI agent on device, the way it actually has to work.

Most products marketed as a personal AI on device ship one piece of the puzzle. A local model. A chat-side memory. A voice-controlled launcher. The full thing requires three at once, and the third one (per-turn prompt injection from a local database) is the part everyone skips. This page walks through how Fazm puts all three together on a Mac, with the file paths and the line of Swift that does the wrapping.

Matthew Diakonov, Written with AI

Published May 1, 20269 min read

4.9from open source on GitHub

Four-table local SQLite schema for personal context

Profile wrapped into every chat turn at ChatProvider.swift line 1593

Works with a local or cloud model, same data path

Direct answer (verified 2026-05-01)

A personal AI agent on device is one that (1) runs on your own machine, (2) stores facts about you in a local database that the agent reads at runtime, and (3) injects those facts into every model prompt without a network round trip. Most products that use the phrase only ship one of those three pieces. Fazm ships all three on macOS through a four-table SQLite schema in Desktop/Sources/AppDatabase.swift and one wrapping function in Desktop/Sources/Providers/ChatProvider.swift line 1593.

Why most pages on this topic miss the point

I read the pages that currently rank for this question and they split into two camps. The first camp is mobile assistants that promise to do things on your phone, where the on-device claim mostly means a small embedded model and the rest of the agent still calls a server. The second camp is generic listicles comparing Pi, Personal.ai, ChatGPT memory, and a handful of self-hosted bots. Neither answers the question someone actually has when they search for this: what does a real personal agent running on my computer look like in the place it has to work, which is the database file the model reads from before every turn.

The model is not the personal part. A 70B parameter local model with no idea who you are is not a personal AI agent. It is a local LLM. The personal part is a tiny SQLite row that the chat layer wraps into a system prompt. Without it, the model is indistinguishable from a fresh ChatGPT session. With it, the model knows your name, your stack, your active projects, and the apps you actually use, before you have typed a word.

1 SQL row

“The personal part of a personal AI agent is not the model. It is the local database row that gets wrapped into every prompt before the model runs.”

What this page is about

The three things that have to be true at the same time

Pick any product with a personal AI on device claim and you can grade it against three conditions. None of them is hard on its own. The trick is doing all three.

Condition 1

Local data ingestion

The agent reads facts about you off your machine on its own. In Fazm this is the file indexer that scans your primary folders during onboarding and writes one row per file into the indexed_files table.

Condition 2

Local profile store

Those raw facts get summarised into something the model can actually use in a single prompt. In Fazm a parallel exploration session writes 3 to 5 paragraphs to ai_user_profiles.profileText, and a knowledge graph of 30 to 50 nodes lands in local_kg_nodes and local_kg_edges.

Condition 3

Per-turn prompt injection

Every turn the model runs, the cached profile is wrapped in an XML tag and concatenated into the system context block. ChatProvider.swift line 1593, formatAIProfileSection(). One line of Swift, no network call.

The four-table local schema, by file and line

The schema is created in a single GRDB migration namedfazmV1inDesktop/Sources/AppDatabase.swiftstarting at line 882. Four tables anchor the personal context layer.

// Desktop/Sources/AppDatabase.swift, line 882
migrator.registerMigration("fazmV1") { db in
    try db.create(table: "ai_user_profiles") { t in
        t.autoIncrementedPrimaryKey("id")
        t.column("profileText", .text).notNull()
        t.column("dataSourcesUsed", .integer).notNull()
        t.column("backendSynced", .boolean).notNull().defaults(to: false)
        t.column("generatedAt", .datetime).notNull()
    }
    try db.create(table: "indexed_files") { t in
        t.autoIncrementedPrimaryKey("id")
        t.column("path", .text).notNull()
        t.column("filename", .text).notNull()
        t.column("fileExtension", .text)
        t.column("fileType", .text).notNull()
        // ... sizeBytes, folder, depth, createdAt, modifiedAt, indexedAt
    }
    try db.create(table: "local_kg_nodes") { t in
        t.autoIncrementedPrimaryKey("id")
        t.column("nodeId", .text).notNull().unique()
        t.column("label", .text).notNull()
        t.column("nodeType", .text).notNull()
        // person, organization, place, thing, or concept
    }
    try db.create(table: "local_kg_edges") { t in
        t.autoIncrementedPrimaryKey("id")
        t.column("edgeId", .text).notNull().unique()
        t.column("sourceNodeId", .text).notNull()
        t.column("targetNodeId", .text).notNull()
        t.column("label", .text).notNull()
    }
}

Five datetime indices, two unique indices on path and node identifiers, and the whole personal context layer fits in those four tables. Everything else (chat history, observer cards, cron jobs) lives in later migrations and does not feed the profile section directly.

The literal line that makes a chat turn personal

The wrap happens inDesktop/Sources/Providers/ChatProvider.swiftaround line 1577. The cached profile is a private string on the provider; loadAIProfileIfNeeded() fills it once from the latest ai_user_profiles row. formatAIProfileSection() returns the wrapped fragment that gets concatenated into the system context block right before the model is called.

// Desktop/Sources/Providers/ChatProvider.swift, line 1577
private func loadAIProfileIfNeeded() async {
    guard !aiProfileLoaded else { return }
    if let profile = await AIUserProfileService.shared.getLatestProfile() {
        cachedAIProfile = profile.profileText
        log("ChatProvider loaded AI profile (generated \(profile.generatedAt))")
    }
    aiProfileLoaded = true
}

private func formatAIProfileSection() -> String {
    guard !cachedAIProfile.isEmpty else { return "" }
    return "\n<ai_user_profile>\n\(cachedAIProfile)\n</ai_user_profile>"
}

That is the whole mechanism. One read of the most recent profile row at session start, one string interpolation per turn, and the model sees a paragraph of you tagged <ai_user_profile> before it sees the user message. No network call, no token round trip to a remote memory service, no plugin to install. The data was on disk before the chat session opened.

How a single chat turn flows through the layer

Inputs on the left are everything that contributed to the profile during onboarding. The hub in the middle is the local SQLite database. Outputs on the right are everything the agent sees on the way to the model.

One chat turn, no network calls until the model itself

Three patterns side by side

Where the personal context lives, who reads it, and what happens to it. The point of this table is not to argue Fazm is better at every cell. It is to make the trade-offs honest for someone deciding which pattern fits their work.

Feature	Cloud chat memory	Fazm (this page)
Where personal context lives	Provider servers, scoped per account	Local SQLite in app sandbox
Source of truth about you	What you have typed into chats	indexed_files, kg, profile summary
Network call to load context	One per session	Zero, file read from disk
Works with any model	Tied to provider that owns memory	Yes, custom API endpoint
Reaches your file system	No, only what you copy in	Yes, indexer scans your folders
Reset by deleting one file	Account settings flow	Yes, the SQLite DB or one row
Open source	No	Yes, github.com/m13v/fazm

“The personal AI agent on-device pitch only earns the word personal if there is a row in a local database the model gets to read before you type. Everything else is a chat with extra steps.”

Matthew Diakonov

Building Fazm

Want to read your own profile after onboarding?

Fifteen minutes on a call. Bring your Mac, install Fazm, watch the parallel exploration session write your first ai_user_profiles row, then decide if the layer is worth keeping on.

Questions people ask before installing a personal AI agent on a Mac

What does 'personal AI agent on device' actually mean?

Three things at once. First, it runs on your computer rather than calling out to a cloud chat endpoint with every message. Second, it stores facts about you locally, in a database file in your home folder, instead of relying on whatever the model remembers from your last few prompts. Third, it injects those facts into every prompt before the model runs, without a network hop. Most products marketed as 'personal AI on device' ship one of those three pieces. A local model is not enough on its own. Cloud memory layered onto a chat is not enough either. The full version requires the data layer.

Where is the data stored on disk in Fazm?

A single SQLite file in the app sandbox, opened through GRDB. The schema is created in Desktop/Sources/AppDatabase.swift starting at line 882, and the four tables that anchor the personal context layer are ai_user_profiles, indexed_files, local_kg_nodes, and local_kg_edges. ai_user_profiles holds the AI-generated profile summary as plain text. indexed_files is a row per file the indexer found on your Mac during onboarding (path, filename, file type, size, modified date). local_kg_nodes and local_kg_edges hold the knowledge graph of you as a person and the tools, languages, projects, and applications connected to you.

How does the agent see the profile during a normal chat turn?

The injection point is at Desktop/Sources/Providers/ChatProvider.swift line 1593. The function formatAIProfileSection() wraps the cached profile text in an XML tag and concatenates it into the system context block before the user's message hits the model. The exact line is `return "\n<ai_user_profile>\n\(cachedAIProfile)\n</ai_user_profile>"`. The profile is loaded once per session via loadAIProfileIfNeeded() and read from the latest row in ai_user_profiles ordered by generatedAt desc.

How is the profile generated in the first place?

During onboarding, after scan_files indexes your Mac into indexed_files, Fazm runs a parallel background session whose system prompt is at Desktop/Sources/Chat/ChatPrompts.swift line 526. That session has one tool, execute_sql, and runs 5 to 12 SELECT queries against indexed_files (file type distribution, programming languages by extension, project indicator files like package.json or Cargo.toml, recently modified files, installed apps from /Applications, document types). It then writes a 3-5 paragraph third-person profile summary to ai_user_profiles. The whole loop happens locally with no user interaction.

How is this different from Apple Intelligence's personal context?

Different surface. Apple Intelligence's personal context, rolling out gradually through 2026, is bound to Apple's own indices: Mail, Calendar, Messages, Notes, Photos. Fazm's local profile is bound to your file system, your installed apps, and the tools and projects an AI summarised from those. Apple's surface is correct for an OS-level Siri use case. Fazm's surface is correct for an agent that has to act inside arbitrary apps with your context already loaded. The two do not overlap: an Apple Intelligence query about a flight time draws on Mail, a Fazm chat about 'my main project' draws on indexed_files plus the local knowledge graph.

How is this different from ChatGPT memory or Claude memory?

Cloud memory is a chat-side feature. The model writes notes about you across sessions and prepends them. It works for what you typed. It cannot see the layout of your home folder, the languages you actually code in by extension count, or the projects you have at the top level of ~/. Fazm reads those directly from indexed_files and turns them into a profile summary that is on disk before any chat starts. The trade-off is that the data is real and sensitive, which is why it never leaves the SQLite file unless you explicitly point a chat turn at a cloud model and that turn happens to need a slice of the profile.

Is the local model required, or can it route to a cloud model?

Both work, and the data layer is unchanged. Fazm has a custom API endpoint setting that lets you route to any Anthropic-compatible gateway, including a fully local server, your own corporate proxy, or a hosted provider. The personal context layer is upstream of that decision. The profile gets wrapped into <ai_user_profile> before the prompt is built; the prompt then goes to whichever endpoint you have configured. If you want end-to-end on-device, point the endpoint at a local model. If you accept cloud inference, the slice still comes from your local SQLite.

Where can I read the source to verify any of this?

Three concrete paths in the public repo at github.com/m13v/fazm. Desktop/Sources/AppDatabase.swift lines 882 through 936 for the four-table schema migration. Desktop/Sources/FileIndexing/AIUserProfileService.swift for the 156-line GRDB service that reads, writes, and updates ai_user_profiles. Desktop/Sources/Providers/ChatProvider.swift around line 1583 for loadAIProfileIfNeeded() and line 1593 for the wrap. Desktop/Sources/Chat/ChatPrompts.swift line 526 for the onboarding exploration system prompt. Open the files, follow the references, and you can trace one chat turn from keypress to local SQLite read in under twenty minutes.

What does the user see during all this?

On first launch, the file indexer runs in the background while the user finishes onboarding chat. The parallel exploration session writes the first profile silently. From that point on, every chat turn benefits from the wrapped profile section without the user noticing. There is no 'personal context' settings panel to fill in. The profile is regenerated when the database changes meaningfully, and the user can delete it (deleteAllProfiles in AIUserProfileService) to reset. The visible UX is just a chat that already knows who you are when you ask it to do something with 'my project' or 'the usual address'.

Keep reading

Companion piece

Personal context for AI agents on macOS, the way it actually ships in 2026

Adjacent angle: how Fazm extracts identity, addresses, and payment metadata out of local Chromium browser SQLite into ~/ai-browser-profile/memories.db.

Read

Deeper dive

Local AI desktop agent: how Fazm watches 60 minutes of your Mac

The other half of personal: a Gemini-powered observer that reads a rolling video buffer of your active window and proposes one task at a time.

Read

Primer

Local AI agent: what it is, why it matters, and how to run one

Step back from the schema and look at the broader category: local agents that read your apps and take real actions on your behalf.

Read