Optimization
15 articles about optimization.
100M Tokens Tracked: 99.4% Were Input and Parallel Agents Make It Worse
After tracking 100M tokens, 99.4% were input tokens. Running parallel Claude Code agents multiplies the input cost problem. Here is how CLAUDE.md scoping helps.
Accessibility Tree Dumps Overflow LLM Context Windows - How to Fix It
Raw accessibility tree data can consume 24KB or more per dump, flooding AI agent context windows. The fix: write to temp files and return concise summaries instead.
Running 5 Parallel AI Agents Is Making My API Bill a Second Rent Payment
Running multiple Claude Code agents in parallel on a macOS app. The API costs add up fast. Model routing, context pruning, and local models all help reduce the bill.
Why the Accessibility Tree Beats Screenshots for Desktop Automation: Lessons From Amazon Checkout
We use the accessibility tree instead of screenshots for desktop automation. Here is why AXUIElement hierarchy is faster, cheaper, and more reliable - with lessons from automating Amazon checkout.
Browser Automation: Accessibility Snapshots vs Screenshots - Saving Tokens by Skipping Pixels
Switching from screenshots to accessibility snapshots for browser automation saved us massive token costs. Here is why structured data beats pixel analysis for AI agents.
MCP Tool Responses Are the Biggest Context Hog - How to Compress Them
MCP server tool responses silently eat your context window. Here is how to compress accessibility tree data and other MCP outputs before they fill your token budget.
What Half a Million Desktop Agent Actions Taught Us About Failure
Lessons from analyzing 500K desktop agent actions - the most common failures, successes, and what to optimize first.
Inference Optimization Is a Distraction for AI Agent Builders
Why optimizing API call speed barely matters for AI agents - the real bottleneck is action execution, not model inference.
How Much Are You Actually Spending on LLMs Every Month?
A breakdown of typical developer LLM spending, where the money goes, and how local models and context pruning can cut costs dramatically.
How to Cut AI Agent Costs 50-70% with Model Routing
Route simple tasks to local Ollama models, complex ones to Claude. Combine that with aggressive state summarization and context pruning to keep token usage down without losing important information.
The Engineer's Trap - Optimizing Everything Like Debugging Code
Software engineers try to optimize meditation, relationships, and life like debugging code. Sometimes the best approach is to stop optimizing and let things work.
Why Removing Unused MCP Servers Speeds Up Claude Code More Than Removing Skills
Trimming unused MCP servers made way more difference than removing skills. MCP servers are actual processes that all have to handshake on startup.
Real-Time AI Agent Performance - Fixing the Screenshot Pipeline
Your AI agent is slow because of screenshot capture, not LLM inference. Here are practical techniques to speed up the capture pipeline.
Fixing SwiftUI LazyVGrid Performance Issues on macOS
LazyVGrid jitter and stuttering on macOS comes from view identity instability. Here are practical fixes: stable .id() values, extracted cell views, async image loading, and avoiding inline closures.
I Installed 20 MCP Servers and Everything Got Worse - Why Fewer Is Better
More MCP servers means hundreds of tool definitions competing for attention. Stripping down to 3 servers made Claude pick the right tool on the first try.