Why Don't We Have a Proper Control Plane for LLM Usage?
Why Don't We Have a Proper Control Plane for LLM Usage?
Every cloud service has a control plane - a management layer for provisioning, monitoring, and governing resource usage. Databases have them. Compute has them. Networking has them. LLM usage somehow does not.
What a Control Plane Would Do
A proper LLM control plane would provide:
- Rolling budgets - set a weekly or monthly spend limit that automatically throttles usage as you approach it
- Automatic model downgrade - when the budget gets tight, route requests to cheaper models instead of failing
- Per-user and per-project quotas - allocate budget across teams and projects with visibility into who is using what
- Usage analytics - understand which tasks consume the most tokens, which models are being used for which purposes, and where waste is happening
- Rate limiting with queuing - instead of hard rate limit errors, queue requests and process them when capacity is available
Why It Does Not Exist Yet
The LLM API market is still young. Providers are competing on model capability, not infrastructure maturity. And most teams underestimate their LLM spend until they get a surprise bill.
The current state is each team building their own ad-hoc solution: a proxy server with rate limiting, a spreadsheet tracking API costs, manual budget reviews. This is where every cloud service was before proper control planes emerged.
Building Your Own (For Now)
Until standard solutions exist:
- Proxy all LLM calls through a single gateway
- Log every request with token count, model used, latency, and requesting service
- Set budget alerts at 50%, 80%, and 100% of your monthly target
- Implement automatic downgrade - if spend exceeds threshold, route to a cheaper model
- Review weekly to identify waste and optimize routing
For desktop AI agents, this control plane is especially important. An agent running autonomously can burn through API budget fast if left unchecked. Budget-aware agents that automatically adjust their model usage based on remaining budget are significantly more sustainable.
Fazm is an open source macOS AI agent. Open source on GitHub.