Where Engineering Time Actually Goes in Production Agents
Where Engineering Time Actually Goes in Production Agents
The token keeps him honest. That phrase stuck with me because it captures the reality of production agent work - the unglamorous plumbing that consumes 80% of engineering time.
The Core Logic Is Easy
Building an agent that does the thing - sends the email, files the ticket, updates the spreadsheet - takes a day. Maybe two. The LLM call, the tool integration, the happy path. Done.
Then you deploy it and discover that the happy path is maybe 60% of actual usage.
Where the Time Goes
Token management alone is a rabbit hole. You need to track usage per request, enforce budgets, handle context window limits gracefully, and decide what to truncate when the context gets too long. Each of these sounds simple. Each has edge cases that take days to handle properly.
Rate limits from API providers hit at unpredictable times. You need exponential backoff, queue management, and fallback strategies. When Claude is rate-limited, do you wait or switch to a different model? What if the fallback model does not support the same tools?
Retry logic seems straightforward until you realize that retrying a partially-completed action can cause duplicate side effects. Sending an email twice. Creating two tickets. Transferring money twice. Now you need idempotency tokens, state checkpoints, and rollback mechanisms.
The Honest Breakdown
For a production agent running real tasks:
- 20% of time goes to the core agent logic
- 30% goes to error handling and recovery
- 25% goes to monitoring and observability
- 25% goes to edge cases you discover after deployment
The last category never ends. Every week brings a new edge case you did not anticipate. A user with Unicode characters in their name. An API that returns HTML instead of JSON on Tuesdays. A file that is technically valid but has a zero-byte character in the middle.
Fazm is an open source macOS AI agent. Open source on GitHub.