Optimization
29 articles about optimization.
Parallel API Pricing: What Concurrent Calls Actually Cost
Parallel API pricing breaks down differently than sequential usage. Here is what running concurrent LLM calls costs, how providers charge, and how to optimize spend.
A/B Testing Claude Code Hooks - Optimizing Token Usage
Cache read jumps show that hooks front-load context effectively. How to A/B test Claude Code hooks for performance and measure the impact on token consumption.
What Does Remember Mean for an Agent? Store Everything, Prune 80%
We stored everything for 3 weeks then pruned 80%. Agent responses got sharper. Memory is not about storing more - it is about keeping less of the right things.
Agents Can Overload Their Own Context - Use Separate Context with Shared Log
When agents share context, they overload it with each other's noise. Separate context per agent with a shared append-only log keeps each agent focused while
AI Agents That Optimize Themselves Instead of Doing the Actual Task
Your AI agent spent 3 hours optimizing its own memory system instead of building features. The self-optimization trap and how to keep agents focused on real
Use Sonnet for Grunt Work, Opus for Architecture
Most developers use the same AI model tier for everything and burn through their subscription. Matching model capability to task complexity cuts costs
Rolling Your Own Agent Logging - SQLite Locally, Postgres in the Cloud
Building custom logging for a desktop agent revealed that 40% of token spend went to retries from the model misunderstanding accessibility tree data.
MCP Server Context Window Bloat and Why You Need a Toggle
Too many MCP servers trash your context window with tool definitions. A toggle approach lets you activate only the servers you need for each task.
Tokens Used Loading MCP Tools - Measuring and Reducing the Overhead
31 MCP tools can eat 3-5k tokens just loading schemas. Here is how to measure and optimize MCP tool token overhead in Cursor, Claude Code, and other AI
The Hidden Token Cost of MCP Tools in Cursor and How to Fix It
31 Atlassian MCP tools burn 2-3k tokens per request just from schema definitions. A 400-tool enterprise server can exceed Claude's entire context window before you ask anything. Here's how to cut tool overhead by 85-100x.
How to Choose Which Model for Each Task in AI Agents
Tiered model routing sounds smart but adds complexity. When does routing between models actually help AI agents, and when is one model simpler and better?
Personality Is a Luxury Tax on AI Agents - How Trimming CLAUDE.md Improved Output
Personality is a luxury tax. Trimming CLAUDE.md personality instructions improved code output quality by reducing token waste and keeping the agent focused
The Sanitization Tax
Raw accessibility tree data is messy but information-rich. The tradeoff between sanitizing it for cleanliness and keeping tokens low is harder than it looks.
Stripping Personality from AI Agent Config for 7 Days - The Token Cost of Personality
We removed all personality instructions from our AI agent for a week. The token savings were significant. Personality is a luxury tax on every single agent
100M Tokens Tracked: 99.4% Were Input and Parallel Agents Make It Worse
After tracking 100M tokens, 99.4% were input tokens. Running parallel Claude Code agents multiplies the input cost problem. Here is how CLAUDE.md scoping, prompt caching, and context architecture helps.
Accessibility Tree Dumps Overflow LLM Context Windows - How to Fix It
Raw accessibility tree data can consume 24KB or more per dump, flooding AI agent context windows. The fix: write to temp files and return concise summaries
Running 5 Parallel AI Agents Is Making My API Bill a Second Rent Payment
Running multiple Claude Code agents in parallel on a macOS app. The API costs add up fast. Model routing, context pruning, and local models all help reduce
Why the Accessibility Tree Beats Screenshots for Desktop Automation: Lessons From Amazon Checkout
Screenshots cost thousands of tokens and fail on layout changes. The macOS AXUIElement accessibility tree delivers structured UI data in 200-500 tokens with 90%+ task success rates. Here is the implementation.
Browser Automation: Accessibility Snapshots vs Screenshots - Saving Tokens by Skipping Pixels
Switching from screenshots to accessibility snapshots for browser automation saved us massive token costs. Here is why structured data beats pixel analysis
MCP Tool Responses Are the Biggest Context Hog - How to Compress Them
MCP server tool responses silently eat your context window. Here is how to compress accessibility tree data and other MCP outputs before they fill your
What Half a Million Desktop Agent Actions Taught Us About Failure
Lessons from analyzing 500K desktop agent actions - the most common failures, successes, and what to optimize first.
Inference Optimization Is a Distraction for AI Agent Builders
Why optimizing API call speed barely matters for AI agents - the real bottleneck is action execution, not model inference.
How Much Are You Actually Spending on LLMs Every Month?
A breakdown of typical developer LLM spending, where the money goes, and how local models and context pruning can cut costs dramatically.
How to Cut AI Agent Costs 50-70% with Model Routing
Route simple tasks to local Ollama models, complex ones to Claude. Combine that with aggressive state summarization and context pruning to keep token usage
The Engineer's Trap - Optimizing Everything Like Debugging Code
Software engineers try to optimize meditation, relationships, and life like debugging code. Sometimes the best approach is to stop optimizing and let things
Why Removing Unused MCP Servers Speeds Up Claude Code More Than Removing Skills
Trimming unused MCP servers made way more difference than removing skills. MCP servers are actual processes that all have to handshake on startup.
Real-Time AI Agent Performance - Fixing the Screenshot Pipeline
Your AI agent is slow because of screenshot capture, not LLM inference. Here are practical techniques to speed up the capture pipeline.
Fixing SwiftUI LazyVGrid Performance Issues on macOS
LazyVGrid jitter and stuttering on macOS comes from view identity instability. Here are practical fixes: stable .id() values, extracted cell views, async
I Installed 20 MCP Servers and Everything Got Worse - Why Fewer Is Better
More MCP servers means hundreds of tool definitions competing for attention. Stripping down to 3 servers made Claude pick the right tool on the first try.