Back to Blog

How Much Are You Actually Spending on LLMs Every Month?

Fazm Team··2 min read
llm-costsapi-spendingoptimizationlocal-modelsbudget

How Much Are You Actually Spending on LLMs Every Month?

I polled 50 developers about their monthly LLM API spending. The range was $30 to $600, with most landing between $100 and $300. Almost nobody tracked where the money actually went. When they did, the breakdown was surprising.

Where the Money Goes

The biggest cost driver is not the useful work. It is repeated context loading. Every time you start a new conversation, the entire system prompt, file contents, and instructions get re-sent as input tokens. For agents that work with codebases, this context window can be 50K-100K tokens per request - and the first request in every session is almost entirely context, not new work.

The second biggest cost is retries. When an API call fails or the model produces bad output, the agent retries with the same large context. Three retries on a 80K-token request means you paid for 320K input tokens to get one useful response.

The Simple Optimizations

Aggressive context pruning makes the biggest difference. Instead of loading entire files, load only the relevant functions. Instead of including full conversation history, summarize previous exchanges. These changes can cut input tokens by 60-70% with minimal quality loss.

Model routing is the second lever. Use a smaller, cheaper model for simple tasks - classification, formatting, extraction - and reserve the expensive frontier model for complex reasoning. Most agent frameworks now support routing rules based on task type.

The Local Model Option

For tasks that do not need frontier intelligence, local models running on Apple Silicon cost nothing per token. File organization, simple code generation, data extraction, template filling - a 7B parameter model handles these fine. Reserve API calls for tasks where model quality directly impacts output quality.

Track Before You Optimize

Most developers have no idea what they spend because they never look at the usage dashboard. Check your OpenAI or Anthropic billing page right now. Sort by day. Find the spikes. Those spikes are usually repeated context loading or retry loops - both fixable without changing your workflow.

Fazm is an open source macOS AI agent. Open source on GitHub.

Related Posts