Custom API Endpoints for AI Agents: Cut Costs with Proxy Routing

Most developers running AI agents pay full price for every API call because they use the default endpoint. But a single environment variable, ANTHROPIC_BASE_URL, lets you route agent traffic through GitHub Copilot, OpenRouter, corporate gateways, or any OpenAI-compatible proxy. The result is the same model quality at 50-80% lower cost. Here is how to set it up, which providers to consider, and when each option makes sense.

OSS

“Fazm has a built-in settings field for custom API endpoints, so you can route through any provider without editing config files.”

fazm.ai

1. How ANTHROPIC_BASE_URL Works

When an AI agent makes an API call to Claude, it sends the request to a base URL. By default this is https://api.anthropic.com. The ANTHROPIC_BASE_URL environment variable overrides this default, sending every request to whatever URL you specify instead.

The proxy at that URL receives the request, potentially modifies headers or adds authentication, and forwards it to the actual model provider. The response flows back through the same path. From the agent's perspective, nothing changes. It sends the same request format and receives the same response format. The routing happens transparently.

Setting it up is a single line in your shell configuration:

export ANTHROPIC_BASE_URL=https://your-proxy.example.com/v1

Some agents also support OPENAI_BASE_URL for OpenAI-compatible endpoints. The pattern is the same: override the base URL to redirect traffic through a cheaper or more controlled path.

2. Routing Through GitHub Copilot

GitHub Copilot Business and Enterprise subscriptions include API access to Claude and other models as part of the flat monthly fee. If your organization already pays for Copilot, you can route your AI agent traffic through Copilot's endpoint and pay nothing extra for the API calls.

The setup involves pointing ANTHROPIC_BASE_URL at Copilot's API gateway and using your Copilot token for authentication. The exact URL format depends on your organization's Copilot configuration, but the general pattern is:

export ANTHROPIC_BASE_URL=https://api.githubcopilot.com
export ANTHROPIC_API_KEY=your-copilot-token

The trade-off is rate limiting. Copilot endpoints typically have lower rate limits than direct API access. For development workflows where you make a few dozen calls per hour, this is not an issue. For production workloads with hundreds of concurrent requests, direct API access may still be necessary.

Check your Copilot plan's terms of service to confirm that routing agent traffic through the endpoint is permitted. Some plans restrict usage to specific tooling integrations.

3. OpenRouter and LiteLLM: Multi-Provider Proxies

OpenRouter and LiteLLM sit between your agent and multiple model providers, giving you a single endpoint that can route to any model. The advantage is flexibility: you can switch models, balance costs, and add fallbacks without changing your agent configuration.

OpenRouter

OpenRouter is a hosted service that aggregates access to Claude, GPT-4, Gemini, and dozens of other models through a single API. You set your base URL to OpenRouter's endpoint and specify which model you want in the request. Pricing is per-token with a small markup over direct API prices, but you get unified billing, automatic fallbacks, and access to models you might not have direct API access to.

LiteLLM

LiteLLM is an open-source proxy you run yourself. It translates between different API formats (Anthropic, OpenAI, Bedrock, Vertex) so you can use any model with any tool that expects a specific API format. It also handles load balancing, cost tracking, and rate limit management.

# LiteLLM config example
model_list:
  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514
      api_key: sk-ant-xxx
  - model_name: claude-sonnet
    litellm_params:
      model: bedrock/claude-sonnet-4-20250514
      aws_access_key_id: xxx

With LiteLLM running locally, you point ANTHROPIC_BASE_URL at localhost and the proxy handles all the routing, failover, and cost optimization transparently.

Custom endpoint support built in

Fazm has a dedicated settings field for custom API endpoints. Point it at any proxy, gateway, or provider without editing environment variables.

Try Fazm Free

4. Corporate API Gateways and Compliance

Enterprise teams often route AI traffic through internal API gateways for compliance, logging, and cost allocation. Products like Azure API Management, AWS API Gateway, and Kong can sit between your agents and the model provider, adding authentication, rate limiting, usage tracking, and data loss prevention.

The typical setup involves deploying a gateway that accepts Anthropic-formatted requests, logs them for compliance, strips or masks sensitive data, and forwards the cleaned request to the model provider. Responses flow back through the same path, getting logged on the return trip.

For teams with strict data residency requirements, routing through Amazon Bedrock or Google Vertex AI ensures that model inference happens within a specific cloud region. You set ANTHROPIC_BASE_URL to your Bedrock or Vertex endpoint and get the same Claude models with the data handling guarantees your compliance team requires.

Cost allocation is another benefit. When all AI traffic flows through a central gateway, you can tag requests by team, project, or feature, making it straightforward to attribute costs and set per-team budgets.

5. Cost Comparison: Direct vs. Proxy Routing

The cost difference between direct API access and proxy routing depends on your usage pattern and which proxy you choose. Here is a rough comparison for a developer running an AI agent for about 4 hours per day:

Provider	Monthly cost (est.)	Rate limits	Setup effort
Direct Anthropic API	$100-300	High	Minimal
GitHub Copilot routing	$19-39 (flat fee)	Lower	Low
OpenRouter	$80-250	High	Low
LiteLLM (self-hosted)	$60-200	You control	Moderate
AWS Bedrock	$80-250	High	Moderate

The biggest savings come from routing through a subscription you already pay for (like Copilot) or from using LiteLLM to automatically select the cheapest provider for each request. For teams already on Copilot Business, the marginal cost of agent traffic is effectively zero.

6. When to Use Which Provider

The best routing strategy depends on your situation:

You have a Copilot subscription: Route through Copilot for development workloads. Use direct API access only for production workloads that need higher rate limits.

You need multiple models: Use OpenRouter for hosted convenience or LiteLLM for self-hosted control. Both let you switch models without changing your agent configuration.

You have compliance requirements: Use Bedrock or Vertex for data residency, or deploy a corporate gateway for logging and DLP.

You want maximum savings: Run LiteLLM with model routing that sends simple tasks to cheaper models and only uses frontier models for complex reasoning. This can reduce costs by 60-80% compared to sending everything to the most expensive model.

You want simplicity: Stick with direct API access. The cost is higher, but the setup is trivial and there are no intermediate systems to maintain or debug.

7. Setting Up Your AI Agent with Custom Endpoints

Most AI agents and coding tools support custom endpoints either through environment variables or settings files. The general pattern is:

# For Claude-based agents
export ANTHROPIC_BASE_URL=https://your-proxy.example.com

# For OpenAI-compatible agents
export OPENAI_BASE_URL=https://your-proxy.example.com/v1

# For agents that support both
export ANTHROPIC_BASE_URL=https://proxy-a.example.com
export OPENAI_BASE_URL=https://proxy-b.example.com

Some tools have dedicated UI for this. Fazm, for example, has a built-in settings field where you enter your custom endpoint URL and API key directly in the app, without editing environment variables or config files. This makes it straightforward to switch between providers or test different routing configurations.

When setting up custom endpoints, test the connection before running expensive workloads. Send a simple prompt and verify that the response format matches what your agent expects. Proxy-related errors are often subtle: the request succeeds but the response is missing fields or uses a slightly different format.

Keep your direct API key as a fallback. If your proxy goes down or hits rate limits, you want the ability to switch back to direct access quickly. Store both configurations and swap between them with a single environment variable change.

AI agent with built-in endpoint routing

Fazm is a free, open-source AI agent for macOS that controls your apps natively through accessibility APIs. Custom endpoint support is built right into the settings.

Try Fazm Free

Free to start. Fully open source. Runs locally on your Mac.