Cost Optimization

How to Use Custom API Endpoints with AI Desktop Agents

AI agent API bills add up fast, especially when you run multiple agents in parallel. But most teams already have access to LLM capacity they are not using: GitHub Copilot Business seats, corporate API gateways, or self-hosted models. This guide covers how to route AI desktop agents through custom endpoints so you can cut costs, stay compliant, and keep your existing workflow.

$0 extra API cost

“A user built a local proxy translating GitHub Copilot Business calls into Anthropic-format requests. Pointed their agent at it and it just worked, no config changes needed.”

Community report

1. Why Custom API Endpoints Matter for AI Agents

Desktop AI agents that control your browser, write code, and automate workflows make a lot of API calls. Each action the agent takes (reading the screen, deciding what to click, verifying the result) involves at least one round trip to an LLM. A single complex task can burn through hundreds of thousands of tokens.

At standard API pricing, that adds up. Running an agent for a few hours a day on tasks like CRM updates, document processing, or browser automation can easily cost $50 to $200 per month in API fees alone. For teams running multiple agents, that number multiplies fast.

Custom API endpoints solve this by letting you swap the backend without changing the agent. Your team might already have GitHub Copilot Business seats (which include API access), a corporate LLM gateway with negotiated pricing, or spare GPU capacity running open source models. The agent does not care where the completions come from, as long as the endpoint speaks the same protocol.

Key insight: The value of an AI agent comes from its ability to perceive and interact with your desktop, not from which specific LLM endpoint it calls. Decoupling the agent from the API provider gives you flexibility on cost, compliance, and latency.

2. Types of Custom Endpoints You Can Use

There are three main categories of custom endpoints that work with Anthropic-compatible AI agents:

Existing subscription proxies

Services like GitHub Copilot Business include API access as part of the subscription. A local proxy translates requests from Anthropic format to the provider's format. You pay nothing extra because the capacity is already included in your seat license.

Corporate API gateways

Many enterprises run centralized LLM gateways (AWS Bedrock, Azure OpenAI, LiteLLM, or custom proxies) that handle authentication, rate limiting, and audit logging. Routing your agent through these keeps everything compliant and on the company credit card.

Self-hosted models

Running Llama, Mistral, or other open source models on your own hardware (or rented GPUs) gives you a fixed-cost endpoint with no per-token charges. Tools like vLLM, Ollama, and text-generation-inference expose OpenAI-compatible APIs that work with most agents.

Already paying for Copilot or Bedrock?

Fazm supports custom API endpoints out of the box. Point it at your existing infrastructure and start automating without extra API costs.

Try Fazm Free

3. The GitHub Copilot Business Proxy Approach

This is one of the more creative setups people have built. GitHub Copilot Business ($19/seat/month) includes access to Claude and GPT models through the Copilot API. By running a lightweight local proxy that accepts Anthropic-format requests and translates them into Copilot API calls, you can route an entire desktop agent through your existing Copilot subscription.

The setup is straightforward: a small Node.js or Python proxy listens on localhost, accepts requests in the Anthropic messages format, translates them to Copilot's format, and returns the response. The agent connects to localhost:PORT instead of api.anthropic.com and everything works as normal.

Fair use note: Check your GitHub Copilot terms of service before relying on this at scale. Copilot Business is licensed for development assistance, and high-volume automation through a proxy may or may not fall within fair use depending on your agreement. Some teams use this for experimentation and switch to a dedicated API plan once they validate the workflow.

The real takeaway is less about Copilot specifically and more about the pattern: any Anthropic-compatible endpoint works. If your agent supports a custom base URL setting, you can point it at whatever backend makes sense for your situation.

4. Corporate API Gateways and Compliance

For teams in regulated industries (finance, healthcare, government), sending desktop automation data to a third-party API is often a non-starter. Corporate API gateways solve this by keeping all LLM traffic on approved infrastructure.

Common setups include:

AWS Bedrock with Claude models, data stays in your VPC, integrates with existing IAM and CloudTrail
Azure OpenAI Service with data residency guarantees and enterprise authentication
LiteLLM as a unified proxy that normalizes different provider APIs behind a single endpoint, with built-in spend tracking and rate limiting
Custom proxy servers that add audit logging, PII redaction, or content filtering before requests reach the model

The point is the same: the agent does not need to know about your compliance requirements. It just talks to an endpoint. Your gateway handles the rest.

5. Self-Hosted and Open Source Model Endpoints

If you have access to GPUs (cloud or on-prem), running an open source model gives you a fixed monthly cost regardless of token volume. The tradeoff is that current open source models are less capable than Claude or GPT-4o for complex multi-step agent tasks, but they work well for simpler automation.

Tool	API Compatibility	Best For
vLLM	OpenAI-compatible	High-throughput serving, multi-GPU
Ollama	OpenAI-compatible	Local development, single-machine setups
text-generation-inference	OpenAI-compatible	Production serving with quantization
LiteLLM	Multi-provider proxy	Unifying multiple backends

A common middle ground is running a small open source model for simple, high-frequency tasks (form filling, data extraction) while routing complex reasoning tasks to Claude or GPT-4o. A proxy that routes by task complexity can cut costs significantly without sacrificing quality on the tasks that matter.

6. Cost and Latency Comparison

Here is a rough comparison for a team running a desktop agent that makes approximately 500 API calls per day (a typical workload for moderate browser automation and document processing):

Endpoint Type	Estimated Monthly Cost	Latency	Setup Effort
Direct Anthropic API	$50 to $200+	Low (direct)	Minimal
Copilot Business proxy	$19/seat (flat)	Low to medium	Moderate (proxy setup)
AWS Bedrock	$30 to $150+	Low	Moderate (IAM, VPC)
Self-hosted (Ollama)	$0 (own hardware)	Varies by hardware	High (GPU, maintenance)

The right choice depends on your existing infrastructure. If your team already has Copilot Business seats, the proxy approach is essentially free. If you are in a regulated environment, Bedrock or Azure is likely the path of least resistance. If you are cost-optimizing aggressively and have GPU access, self-hosting gives you the lowest marginal cost.

7. Getting Started

The simplest way to test custom endpoints with a desktop AI agent:

Pick your endpoint. Start with whatever you already have access to. Copilot Business, an existing Azure deployment, or even Ollama running locally.
Set up translation if needed. If the endpoint does not speak Anthropic's messages format natively, run a lightweight proxy (LiteLLM works well for this).
Configure your agent. Most agents that support custom endpoints let you set a base URL. Point it at your proxy or gateway instead of the default API.
Test with a simple task. Run something straightforward (opening a browser, filling a form) to verify the endpoint works before tackling complex workflows.
Monitor token usage. Compare costs between your custom endpoint and direct API access over a week to validate the savings.

Tools like Fazm have built-in support for custom API endpoints, so you can change the backend in settings without touching any code. Other agents may require environment variables or config file changes, but the pattern is the same across the board.

Route your agent through any API

Fazm supports custom API endpoints natively. Use your Copilot subscription, corporate gateway, or self-hosted models. Free, open source, runs locally on macOS.

Try Fazm Free

No credit card required. Download and start automating in minutes.