Use GitHub Copilot as an AI Agent Backend (No Extra API Costs)
A developer built a local proxy that translates Anthropic API calls into GitHub Copilot Business calls, pointed their AI agent at it via a custom base URL setting, and it just worked. The agent had no idea it was talking to Copilot. The approach proved so useful that support for custom API endpoints is now a first-class feature in desktop AI agents. This guide explains how to set it up - whether you are using Copilot, a corporate gateway, or a local model.
“Fazm uses real accessibility APIs instead of screenshots, so it interacts with any app on your Mac reliably and fast. Free to start, fully open source.”
fazm.ai
1. The API Cost Problem for AI Agents
AI agents are not like chatbots. A chatbot has one exchange per user interaction. An agent running a workflow makes dozens of API calls per task: reading the screen, planning the next step, executing an action, verifying the result, handling errors. Each step is a round trip to an LLM.
For browser automation, document processing, or CRM updates, a single session can burn through hundreds of thousands of tokens. At standard Anthropic API pricing, a developer running agents throughout the workday can easily spend $50 to $200 per month just in inference costs - before any other tooling.
For teams, this multiplies. Ten developers each running agents at moderate intensity can push monthly API bills into four figures. That math gets uncomfortable fast, especially for teams that already pay for GitHub Copilot Business or have access to enterprise AI infrastructure.
The core problem: Most teams are paying twice. They have existing LLM access through Copilot subscriptions or corporate AI gateways, and they are also paying separate API fees for their agents. Custom endpoints fix this by routing everything through infrastructure you already pay for.
2. How Proxy and Gateway Approaches Work
The key insight is that an AI agent does not care which LLM it talks to. It cares about getting a valid response in the format it expects. If you can run something in the middle that translates between protocols, the agent works identically regardless of what is on the other end.
A proxy sits between the agent and the LLM provider:
- Agent sends a request in Anthropic format to
localhost:PORT - Proxy receives the request, translates it to the target provider's format (Copilot, OpenAI, Bedrock, etc.)
- Proxy forwards to the real API, receives the response, translates it back to Anthropic format
- Agent receives a response that looks exactly like what it would get from the real Anthropic API
This pattern is not new. LiteLLM has been doing it across dozens of providers for years. What is newer is that AI agents started being built with configurable base URLs, making it trivial to point them at any compatible endpoint without code changes.
Why this works reliably
The Anthropic messages API format has become a de facto standard. OpenAI compatible endpoints, AWS Bedrock, Azure OpenAI, and most serious LLM hosting solutions all either natively support it or have translation layers available. The ecosystem has converged enough that switching backends is mostly a configuration change.
3. ANTHROPIC_BASE_URL and Custom Endpoints
The standard way to redirect Anthropic API traffic is theANTHROPIC_BASE_URLenvironment variable. When set, the Anthropic SDK sends all requests to that URL instead ofhttps://api.anthropic.com.
# Point the agent at a local proxy
export ANTHROPIC_BASE_URL=http://localhost:8080
# Or a corporate gateway
export ANTHROPIC_BASE_URL=https://ai-gateway.yourcompany.com
# Or a specific regional endpoint
export ANTHROPIC_BASE_URL=https://bedrock-proxy.us-east-1.example.comApplications that use the official Anthropic SDK pick this up automatically. No code changes needed. The SDK just sends requests to the configured URL and expects responses in the same format.
For desktop AI agent apps, many now expose this as a UI setting rather than requiring an environment variable. You fill in a base URL field in preferences, and the app handles the rest. This makes the feature accessible to users who are not comfortable with terminal configuration.
Authentication note: When using a custom endpoint, your API key goes to your proxy, not to Anthropic directly. Make sure your proxy handles authentication appropriately for the backend it is connecting to. Some proxies require you to set a different key (for Copilot, for example, you would use your GitHub token rather than an Anthropic key).
Already paying for Copilot or a corporate AI gateway?
Fazm supports custom API endpoints as a built-in setting. Route your macOS automation agent through your existing infrastructure with no extra costs.
Try Fazm Free4. GitHub Copilot as an AI Agent Backend
GitHub Copilot Business ($19/seat/month) includes access to Claude and GPT-4o through the Copilot API. This capacity is included in your seat license - you are already paying for it whether you use it for agents or not.
A developer on a community forum described exactly this setup: they built a lightweight local proxy in Node.js that accepted requests in Anthropic format, translated them to Copilot API calls using their GitHub token, and returned the response. They pointed their AI agent at it via a custom base URL setting. The agent ran normally. No separate API bill. The proxy code was around 150 lines.
The community response was strong enough that it became an official consideration: AI agent tools that had previously hardcoded the Anthropic endpoint started adding custom base URL settings specifically to support this pattern.
How to set up a Copilot proxy
- Get your GitHub token. You need a token with Copilot API access. For Copilot Business, this is your standard GitHub personal access token with the
copilotscope. - Run a translation proxy. Several open source projects implement this. Search for "copilot to anthropic proxy" on GitHub. Alternatively, LiteLLM supports Copilot as a backend and handles the translation automatically.
- Configure your agent. Set the base URL to your proxy address (typically
http://localhost:4000for LiteLLM). Set any required API key to your GitHub token or a placeholder, depending on how the proxy handles auth. - Verify the connection. Run a simple test task to confirm requests are flowing through the proxy to Copilot. Check the proxy logs to see the traffic.
Terms of service consideration: GitHub Copilot is licensed for development assistance. Using it as a general AI agent backend for high-volume automation may fall outside the intended use. Check your organization's GitHub agreement before relying on this for production workloads. Many teams use this pattern for development and internal tooling where the line between "coding assistance" and "automation" is blurry.
5. Other Compatible Gateways
GitHub Copilot is one option, but the same pattern applies to anything that speaks Anthropic format (or can translate to it):
Corporate AI gateways
Enterprises running AWS Bedrock, Azure OpenAI, or tools like LiteLLM as a centralized proxy can route agent traffic through their existing infrastructure. This keeps data in approved systems, applies spend limits, and generates audit logs for compliance. The agent does not know or care that it is going through a gateway.
Local models via Ollama or vLLM
For teams with GPU access, running Llama 3, Qwen, or other open source models locally gives a fixed-cost endpoint. Ollama exposes an OpenAI-compatible API on localhost. Combine it with a thin translation layer and your agent can run entirely offline with no per-token costs. Quality is lower than frontier models for complex tasks, but fast and free for simpler automation.
OpenRouter and aggregator services
OpenRouter and similar services provide a single endpoint that routes to dozens of different models. You can switch between Claude, GPT-4o, Gemini, and open source models by changing the model name in your request - without changing the endpoint URL. Useful for cost optimization by routing different task types to cheaper models.
Custom Anthropic-compatible gateways
Some teams build their own proxy servers that add capabilities the standard API does not offer: PII redaction before requests hit the model, response caching for repeated queries, rate limiting per user, or logging for compliance. Because the agent just sees a standard endpoint, all of this is transparent.
6. Setting Up a Custom Endpoint in Desktop AI Agents
How you configure a custom endpoint depends on the agent. There are three common patterns:
Environment variable
SetANTHROPIC_BASE_URLbefore launching the agent. Works for any agent built on the official Anthropic SDK. No UI required.
ANTHROPIC_BASE_URL=http://localhost:4000 ./your-agentSettings UI
Agents like Fazm expose this as a settings field. Open preferences, find the API configuration section, and enter your base URL. No terminal required. The change takes effect immediately and persists across restarts.
Config file
Some agents use a config file (often JSON or YAML). Look for abaseUrlorapiEndpointkey. Setting it there overrides the default.
Fazm, for example, includes a dedicated field for custom API endpoints in its settings panel. You enter the base URL for your proxy or gateway, and all inference calls go through it. The feature was added specifically because users were running the Copilot proxy pattern and wanted a clean way to configure it without environment variables.
Once configured, there is nothing else to change. The agent uses the same tool calls, the same automation logic, and the same interaction patterns - it just routes through your chosen backend instead of the default Anthropic API.
Free and open source. Custom endpoint support included.
7. Cost Comparison
Here is a rough breakdown for a single developer running an AI desktop agent at moderate intensity: approximately 500 API calls per day across browser automation, document work, and coding tasks.
| Backend | Monthly Cost | Model Quality | Setup Effort |
|---|---|---|---|
| Direct Anthropic API (Claude Sonnet) | $50 to $200+ | Excellent | Minimal |
| GitHub Copilot Business proxy | $0 extra ($19/seat already paid) | Good (Claude + GPT-4o) | Moderate (proxy setup) |
| AWS Bedrock (Claude via VPC) | $30 to $150+ | Excellent | Moderate (IAM, VPC) |
| OpenRouter (mixed models) | $10 to $80+ | Good to excellent | Low (one URL change) |
| Local model via Ollama (Llama 3) | $0 (own hardware) | Fair (weaker on complex tasks) | High (GPU, maintenance) |
For most individual developers, the Copilot proxy delivers the best cost-to-effort ratio if you already have a seat. The model quality is solid - Copilot Business gives you access to Claude and GPT-4o, which are the same models you would be using through the direct API.
For teams or regulated environments, Bedrock or a corporate LiteLLM deployment is a more defensible choice. The cost savings are smaller than the Copilot approach, but you get proper audit trails, data residency controls, and enterprise support.
OpenRouter is worth considering if you want flexibility without running your own proxy. The single endpoint approach means you can route cheap tasks to cheaper models and expensive reasoning to frontier models, with no infrastructure to maintain.
Route your agent through any API endpoint
Fazm supports custom API endpoints as a built-in setting. Use your GitHub Copilot subscription, corporate gateway, or local models. Free, open source, runs locally on macOS.
Try Fazm FreeNo credit card required. Download and start automating in minutes.