Optimization

29 articles about optimization.

Parallel API Pricing: What Concurrent Calls Actually Cost

·12 min read

Parallel API pricing breaks down differently than sequential usage. Here is what running concurrent LLM calls costs, how providers charge, and how to optimize spend.

parallel-apipricingapi-costsconcurrencyai-agentllmoptimization

A/B Testing Claude Code Hooks - Optimizing Token Usage

·2 min read

Cache read jumps show that hooks front-load context effectively. How to A/B test Claude Code hooks for performance and measure the impact on token consumption.

claude-codehooksoptimizationtokensperformance

What Does Remember Mean for an Agent? Store Everything, Prune 80%

·2 min read

We stored everything for 3 weeks then pruned 80%. Agent responses got sharper. Memory is not about storing more - it is about keeping less of the right things.

agent-memorypruningcontextai-agentsoptimization

Agents Can Overload Their Own Context - Use Separate Context with Shared Log

·2 min read

When agents share context, they overload it with each other's noise. Separate context per agent with a shared append-only log keeps each agent focused while

context-windowmulti-agentshared-logcoordinationoptimization

AI Agents That Optimize Themselves Instead of Doing the Actual Task

·2 min read

Your AI agent spent 3 hours optimizing its own memory system instead of building features. The self-optimization trap and how to keep agents focused on real

ai-agentproductivityself-improvementmemoryoptimization

Use Sonnet for Grunt Work, Opus for Architecture

·2 min read

Most developers use the same AI model tier for everything and burn through their subscription. Matching model capability to task complexity cuts costs

ai-costsmodel-selectionsonnetopussubscriptionoptimization

Rolling Your Own Agent Logging - SQLite Locally, Postgres in the Cloud

·2 min read

Building custom logging for a desktop agent revealed that 40% of token spend went to retries from the model misunderstanding accessibility tree data.

loggingobservabilitytoken-costssqliteoptimizationsideproject

MCP Server Context Window Bloat and Why You Need a Toggle

·2 min read

Too many MCP servers trash your context window with tool definitions. A toggle approach lets you activate only the servers you need for each task.

mcpcontext-windowdeveloper-toolsai-agentsoptimization

Tokens Used Loading MCP Tools - Measuring and Reducing the Overhead

·2 min read

31 MCP tools can eat 3-5k tokens just loading schemas. Here is how to measure and optimize MCP tool token overhead in Cursor, Claude Code, and other AI

mcptokensoptimizationcursorclaude-codeai-tools

The Hidden Token Cost of MCP Tools in Cursor and How to Fix It

·5 min read

31 Atlassian MCP tools burn 2-3k tokens per request just from schema definitions. A 400-tool enterprise server can exceed Claude's entire context window before you ask anything. Here's how to cut tool overhead by 85-100x.

mcptokenscursoroptimizationdeveloper-tools

How to Choose Which Model for Each Task in AI Agents

·2 min read

Tiered model routing sounds smart but adds complexity. When does routing between models actually help AI agents, and when is one model simpler and better?

model-routingai-agentsllm-selectionoptimizationarchitecturewebdev

Personality Is a Luxury Tax on AI Agents - How Trimming CLAUDE.md Improved Output

·2 min read

Personality is a luxury tax. Trimming CLAUDE.md personality instructions improved code output quality by reducing token waste and keeping the agent focused

claude-mdai-agentprompt-engineeringcode-qualityoptimization

The Sanitization Tax

·2 min read

Raw accessibility tree data is messy but information-rich. The tradeoff between sanitizing it for cleanliness and keeping tokens low is harder than it looks.

accessibility-treesanitizationtokensdesktop-agentoptimization

Stripping Personality from AI Agent Config for 7 Days - The Token Cost of Personality

·2 min read

We removed all personality instructions from our AI agent for a week. The token savings were significant. Personality is a luxury tax on every single agent

ai-agenttoken-costoptimizationpersonalityprompt-engineering

100M Tokens Tracked: 99.4% Were Input and Parallel Agents Make It Worse

·13 min read

After tracking 100M tokens, 99.4% were input tokens. Running parallel Claude Code agents multiplies the input cost problem. Here is how CLAUDE.md scoping, prompt caching, and context architecture helps.

tokensapi-costsparallel-agentsclaude-codeclaude-mdoptimization

Accessibility Tree Dumps Overflow LLM Context Windows - How to Fix It

·3 min read

Raw accessibility tree data can consume 24KB or more per dump, flooding AI agent context windows. The fix: write to temp files and return concise summaries

accessibility-treecontext-windowllmmacosoptimizationdesktop-agent

Running 5 Parallel AI Agents Is Making My API Bill a Second Rent Payment

·2 min read

Running multiple Claude Code agents in parallel on a macOS app. The API costs add up fast. Model routing, context pruning, and local models all help reduce

api-costsparallel-agentsclaude-codebudgetoptimization

Why the Accessibility Tree Beats Screenshots for Desktop Automation: Lessons From Amazon Checkout

·6 min read

Screenshots cost thousands of tokens and fail on layout changes. The macOS AXUIElement accessibility tree delivers structured UI data in 200-500 tokens with 90%+ task success rates. Here is the implementation.

accessibility-treedesktop-automationmacosaxuielementoptimization

Browser Automation: Accessibility Snapshots vs Screenshots - Saving Tokens by Skipping Pixels

·2 min read

Switching from screenshots to accessibility snapshots for browser automation saved us massive token costs. Here is why structured data beats pixel analysis

browser-automationaccessibilitytokensoptimizationplaywright

MCP Tool Responses Are the Biggest Context Hog - How to Compress Them

·3 min read

MCP server tool responses silently eat your context window. Here is how to compress accessibility tree data and other MCP outputs before they fill your

mcpcontext-windowaccessibility-apioptimizationtoken-managementclaudecode

What Half a Million Desktop Agent Actions Taught Us About Failure

·2 min read

Lessons from analyzing 500K desktop agent actions - the most common failures, successes, and what to optimize first.

telemetryanalyticsdesktop-agentfailure-modesoptimization

Inference Optimization Is a Distraction for AI Agent Builders

·2 min read

Why optimizing API call speed barely matters for AI agents - the real bottleneck is action execution, not model inference.

inferenceoptimizationdistractionbottleneckperformance

How Much Are You Actually Spending on LLMs Every Month?

·2 min read

A breakdown of typical developer LLM spending, where the money goes, and how local models and context pruning can cut costs dramatically.

llm-costsapi-spendingoptimizationlocal-modelsbudget

How to Cut AI Agent Costs 50-70% with Model Routing

·2 min read

Route simple tasks to local Ollama models, complex ones to Claude. Combine that with aggressive state summarization and context pruning to keep token usage

model-routingcost-reductionollamaclaudeoptimizationartificialinteligence

The Engineer's Trap - Optimizing Everything Like Debugging Code

·2 min read

Software engineers try to optimize meditation, relationships, and life like debugging code. Sometimes the best approach is to stop optimizing and let things

engineer-mindsetoptimizationproductivitydebuggingautomation

Why Removing Unused MCP Servers Speeds Up Claude Code More Than Removing Skills

·3 min read

Trimming unused MCP servers made way more difference than removing skills. MCP servers are actual processes that all have to handshake on startup.

claude-codemcpperformancedeveloper-toolsoptimization

Real-Time AI Agent Performance - Fixing the Screenshot Pipeline

·2 min read

Your AI agent is slow because of screenshot capture, not LLM inference. Here are practical techniques to speed up the capture pipeline.

real-time-aiperformancescreenshot-pipelineoptimizationmacos

Fixing SwiftUI LazyVGrid Performance Issues on macOS

·2 min read

LazyVGrid jitter and stuttering on macOS comes from view identity instability. Here are practical fixes: stable .id() values, extracted cell views, async

swiftuilazyvgridperformancemacosoptimization

I Installed 20 MCP Servers and Everything Got Worse - Why Fewer Is Better

·2 min read

More MCP servers means hundreds of tool definitions competing for attention. Stripping down to 3 servers made Claude pick the right tool on the first try.

mcpclaude-codedeveloper-toolsoptimizationbest-practices

Browse by Topic