Code That Cannot Phone Home - AI Agents for Air-Gapped Systems
Code That Cannot Phone Home
Some environments have no internet connection by design. Military systems, secure research labs, financial trading floors, medical devices, and legal air-gaps. These environments cannot use cloud AI APIs. Everything must happen on the local machine with locally available resources.
This eliminates most current AI agent architectures, which depend on cloud LLMs for reasoning and decision-making. But it does not eliminate the need for automation. People working in air-gapped environments still have repetitive tasks, still need to navigate complex applications, and still benefit from intelligent assistance.
What Air-Gapped Actually Means
An air-gapped system means zero outbound network connections - no API calls to OpenAI, no screenshots sent to a cloud vision model, no telemetry. Data transfer happens only via physical media (USB, optical disk) after security scanning. The enforcement is often at the network hardware level, not software.
The compliance requirements that drive air-gapping include GDPR, HIPAA, and various government security classifications. In the enterprise AI code assistant space, this has become a real market segment - teams at secure financial institutions cannot use standard GitHub Copilot or Claude because of network policies, not preference.
The Local Stack for Air-Gapped Agents
Getting an agent to work without cloud dependencies requires replacing every networked component:
For screen understanding: The accessibility API does not require an internet connection. AXUIElement on macOS provides the complete UI tree using only local system APIs. An agent can read every button label, text field value, and element state without any network call. This is the same API used by VoiceOver and other assistive technologies - it is a stable, documented macOS interface with no external dependencies.
For vision tasks: When accessibility metadata is incomplete (roughly 33% of macOS apps have partial accessibility support), a local vision model handles the gap. MLX-based models on Apple Silicon deliver 20-30% better performance than llama.cpp for vision inference. A 7B parameter multimodal model running locally handles "what does this UI element do" queries without network access.
For reasoning and planning: On-device models through Ollama or llama.cpp fill the gap left by cloud LLMs. Ollama and llama.cpp both support fully air-gapped operation - they are the primary tools for completely offline deployments. A 7B to 13B parameter model handles task classification, instruction following, and simple planning reliably.
Setting Up the Air-Gapped Stack
A working local agent stack for an air-gapped macOS environment:
# Install Ollama (transfer via USB after downloading externally)
# Ollama runs fully offline once models are pulled
# Pull a capable model (do this before the air gap, or transfer model files)
ollama pull llama3.1:8b
# Verify it runs without network
# Disconnect network, then:
ollama run llama3.1:8b "Summarize this text: [your text]"
For the AXUIElement layer in Python using pyobjc:
import AppKit
import ApplicationServices as AS
def get_ui_tree(app_name: str) -> dict:
"""Get the full accessibility tree for a running application."""
workspace = AppKit.NSWorkspace.sharedWorkspace()
running_apps = workspace.runningApplications()
target_app = None
for app in running_apps:
if app_name.lower() in app.localizedName().lower():
target_app = app
break
if not target_app:
return {"error": f"App '{app_name}' not found"}
ax_app = AS.AXUIElementCreateApplication(target_app.processIdentifier())
return _traverse_element(ax_app, depth=0, max_depth=5)
def _traverse_element(element, depth: int, max_depth: int) -> dict:
if depth > max_depth:
return {}
node = {}
# Get role and title without any network call
role = AS.AXUIElementCopyAttributeValue(element, "AXRole", None)[1]
title = AS.AXUIElementCopyAttributeValue(element, "AXTitle", None)[1]
value = AS.AXUIElementCopyAttributeValue(element, "AXValue", None)[1]
if role:
node["role"] = str(role)
if title:
node["title"] = str(title)
if value:
node["value"] = str(value)
# Recurse into children
children = AS.AXUIElementCopyAttributeValue(element, "AXChildren", None)[1]
if children:
node["children"] = [
_traverse_element(child, depth + 1, max_depth)
for child in children
]
return node
This runs entirely locally. No network, no external process, no telemetry.
Local Model Selection for Agents
The choice of local model affects capability significantly. Based on current benchmarks for air-gapped agent use:
| Model | Size | Instruction Following | Tool Calling | Apple Silicon Speed |
|---|---|---|---|---|
| Llama 3.1 8B | 4.7GB | Good | Yes | ~40 tok/s on M3 |
| Qwen2.5 7B | 4.4GB | Very good | Yes | ~45 tok/s on M3 |
| Mistral 7B | 4.1GB | Good | Partial | ~45 tok/s on M3 |
| Llama 3.1 70B | 40GB | Excellent | Yes | ~12 tok/s on M3 Ultra |
For most air-gapped automation tasks, an 8B model is sufficient. The tasks tend to be specific and well-structured - data entry across applications, report generation from multiple sources, system monitoring. These do not require frontier reasoning; they require reliable execution of well-defined steps.
The Trade-Offs Are Real
Local models are less capable than frontier cloud models. A 7B model handles well-defined tasks reliably but struggles with ambiguous instructions or complex multi-step reasoning that requires world knowledge. The solution is constraining the agent's scope - narrow, well-defined automations rather than general-purpose assistance.
In practice, this constraint matches what air-gapped environments need anyway. The tasks are specific and repeatable. You are not asking a trading floor automation to reason about geopolitics - you are asking it to extract numbers from one application and enter them into another. That is well within the capability of a local model running on modern hardware.
The other trade-off is latency. A local 8B model on an M3 MacBook Pro generates around 40 tokens per second. For a task that requires 500 tokens of reasoning and 200 tokens of output, that is about 17 seconds per step. For automation tasks that run in the background, this is acceptable. For interactive tasks, it requires designing the agent to show progress rather than making the user wait on a spinner.
Air-Gapped AI Agents Are Not a Compromise
The framing of "local model as the inferior fallback to cloud" misses the point for this use case. The security property is the primary requirement, not a constraint. An agent that processes sensitive contracts locally, with no network egress, is more suitable for a law firm than a cloud agent - regardless of how much smarter the cloud model is.
The stack exists. The models are capable enough for the tasks. The accessibility APIs provide the interface layer. Air-gapped AI agents are not a future thing waiting on better local models - they are deployable today with the current generation of 7-13B parameter models.
Fazm is an open source macOS AI agent. Open source on GitHub.