Code That Cannot Phone Home - AI Agents for Air-Gapped Systems

Matthew Diakonov

Updated March 30, 2026

air-gapped local-only screen-understanding security offline

Code That Cannot Phone Home

Some environments have no internet connection by design. Military systems, secure research labs, financial trading floors, medical devices, and legal air-gaps. These environments cannot use cloud AI APIs. Everything must happen on the local machine with locally available resources.

This eliminates most current AI agent architectures, which depend on cloud LLMs for reasoning and decision-making. But it does not eliminate the need for automation. People working in air-gapped environments still have repetitive tasks, still need to navigate complex applications, and still benefit from intelligent assistance.

What Air-Gapped Actually Means

An air-gapped system means zero outbound network connections - no API calls to OpenAI, no screenshots sent to a cloud vision model, no telemetry. Data transfer happens only via physical media (USB, optical disk) after security scanning. The enforcement is often at the network hardware level, not software.

The compliance requirements that drive air-gapping include GDPR, HIPAA, and various government security classifications. In the enterprise AI code assistant space, this has become a real market segment - teams at secure financial institutions cannot use standard GitHub Copilot or Claude because of network policies, not preference.

The Local Stack for Air-Gapped Agents

Getting an agent to work without cloud dependencies requires replacing every networked component:

For screen understanding: The accessibility API does not require an internet connection. AXUIElement on macOS provides the complete UI tree using only local system APIs. An agent can read every button label, text field value, and element state without any network call. This is the same API used by VoiceOver and other assistive technologies - it is a stable, documented macOS interface with no external dependencies.

For vision tasks: When accessibility metadata is incomplete (roughly 33% of macOS apps have partial accessibility support), a local vision model handles the gap. MLX-based models on Apple Silicon deliver 20-30% better performance than llama.cpp for vision inference. A 7B parameter multimodal model running locally handles "what does this UI element do" queries without network access.

For reasoning and planning: On-device models through Ollama or llama.cpp fill the gap left by cloud LLMs. Ollama and llama.cpp both support fully air-gapped operation - they are the primary tools for completely offline deployments. A 7B to 13B parameter model handles task classification, instruction following, and simple planning reliably.

Setting Up the Air-Gapped Stack

A working local agent stack for an air-gapped macOS environment:

# Install Ollama (transfer via USB after downloading externally)
# Ollama runs fully offline once models are pulled

# Pull a capable model (do this before the air gap, or transfer model files)
ollama pull llama3.1:8b

# Verify it runs without network
# Disconnect network, then:
ollama run llama3.1:8b "Summarize this text: [your text]"

For the AXUIElement layer in Python using pyobjc:

import AppKit
import ApplicationServices as AS

def get_ui_tree(app_name: str) -> dict:
    """Get the full accessibility tree for a running application."""
    workspace = AppKit.NSWorkspace.sharedWorkspace()
    running_apps = workspace.runningApplications()

    target_app = None
    for app in running_apps:
        if app_name.lower() in app.localizedName().lower():
            target_app = app
            break

    if not target_app:
        return {"error": f"App '{app_name}' not found"}

    ax_app = AS.AXUIElementCreateApplication(target_app.processIdentifier())
    return _traverse_element(ax_app, depth=0, max_depth=5)

def _traverse_element(element, depth: int, max_depth: int) -> dict:
    if depth > max_depth:
        return {}

    node = {}

    # Get role and title without any network call
    role = AS.AXUIElementCopyAttributeValue(element, "AXRole", None)[1]
    title = AS.AXUIElementCopyAttributeValue(element, "AXTitle", None)[1]
    value = AS.AXUIElementCopyAttributeValue(element, "AXValue", None)[1]

    if role:
        node["role"] = str(role)
    if title:
        node["title"] = str(title)
    if value:
        node["value"] = str(value)

    # Recurse into children
    children = AS.AXUIElementCopyAttributeValue(element, "AXChildren", None)[1]
    if children:
        node["children"] = [
            _traverse_element(child, depth + 1, max_depth)
            for child in children
        ]

    return node

This runs entirely locally. No network, no external process, no telemetry.

Local Model Selection for Agents

The choice of local model affects capability significantly. Based on current benchmarks for air-gapped agent use:

Model	Size	Instruction Following	Tool Calling	Apple Silicon Speed
Llama 3.1 8B	4.7GB	Good	Yes	~40 tok/s on M3
Qwen2.5 7B	4.4GB	Very good	Yes	~45 tok/s on M3
Mistral 7B	4.1GB	Good	Partial	~45 tok/s on M3
Llama 3.1 70B	40GB	Excellent	Yes	~12 tok/s on M3 Ultra

For most air-gapped automation tasks, an 8B model is sufficient. The tasks tend to be specific and well-structured - data entry across applications, report generation from multiple sources, system monitoring. These do not require frontier reasoning; they require reliable execution of well-defined steps.

The Trade-Offs Are Real

Local models are less capable than frontier cloud models. A 7B model handles well-defined tasks reliably but struggles with ambiguous instructions or complex multi-step reasoning that requires world knowledge. The solution is constraining the agent's scope - narrow, well-defined automations rather than general-purpose assistance.

In practice, this constraint matches what air-gapped environments need anyway. The tasks are specific and repeatable. You are not asking a trading floor automation to reason about geopolitics - you are asking it to extract numbers from one application and enter them into another. That is well within the capability of a local model running on modern hardware.

The other trade-off is latency. A local 8B model on an M3 MacBook Pro generates around 40 tokens per second. For a task that requires 500 tokens of reasoning and 200 tokens of output, that is about 17 seconds per step. For automation tasks that run in the background, this is acceptable. For interactive tasks, it requires designing the agent to show progress rather than making the user wait on a spinner.

Air-Gapped AI Agents Are Not a Compromise

The framing of "local model as the inferior fallback to cloud" misses the point for this use case. The security property is the primary requirement, not a constraint. An agent that processes sensitive contracts locally, with no network egress, is more suitable for a law firm than a cloud agent - regardless of how much smarter the cloud model is.

The stack exists. The models are capable enough for the tasks. The accessibility APIs provide the interface layer. Air-gapped AI agents are not a future thing waiting on better local models - they are deployable today with the current generation of 7-13B parameter models.

Fazm is an open source macOS AI agent. Open source on GitHub.

Code That Cannot Phone Home - AI Agents for Air-Gapped Systems

Code That Cannot Phone Home

What Air-Gapped Actually Means

The Local Stack for Air-Gapped Agents

Setting Up the Air-Gapped Stack

Local Model Selection for Agents

The Trade-Offs Are Real

Air-Gapped AI Agents Are Not a Compromise

More on This Topic

Related Posts

Manus AI Vercel Key Authentication Leak in My Computer Mode

Third-Party Apps: What They Are, How Permissions Work, and Security Risks

AI Agent Blast Radius: What It Is and How to Measure It