Field notes for Python developers

Mocking the Anthropic Python SDK for local development with ANTHROPIC_BASE_URL

The official anthropic Python library accepts a base_url argument and reads the ANTHROPIC_BASE_URL environment variable when you do not pass one. Either route points every request off api.anthropic.com and onto a host you control, which is exactly what you want for local development and tests. The part the existing guides leave out is a small URL validation rule that, when broken, will silently brick every call you make. This page covers the resolution order, four mocking approaches that work, the pytest pattern, and the production validation guard that exists because the broken case bit a real release.

Matthew Diakonov, Written with AI

Published May 27, 20269 min read

Direct answer (verified 2026-05-27)

Construct the client with Anthropic(api_key="test-key", base_url="http://127.0.0.1:8000") and run any HTTP server on that port that returns a Messages API response. The same effect comes from exporting ANTHROPIC_BASE_URL=http://127.0.0.1:8000 before your process starts. The value MUST be a full URL with the http:// or https:// scheme. A bare host and port like localhost:8000 is accepted by the constructor and then throws Invalid URL on the first request. Source on the resolution behavior: anthropics/anthropic-sdk-python.

How base_url actually resolves in the Python SDK

The constructor signature on anthropic.Anthropic declares base_url: str | httpx.URL | None = None. When the argument is None, the SDK falls through to the environment lookup. When the environment is also empty, it uses the built-in production default. The same chain runs for AsyncAnthropic. Three states, one resolution order.

base_url resolution order

Explicit

base_url= passed to the constructor. Wins over everything else.

Env var

ANTHROPIC_BASE_URL is read with os.environ.get when the argument is None.

Default

Built-in api.anthropic.com endpoint when both above are unset.

Frozen

Stored on the client instance. Each request reuses it. A later env change does not retro-update an existing client.

The frozen step is the one that matters for tests. A pytest fixture that calls monkeypatch.setenv AFTER constructing the client will not change the base URL the client uses. You either construct the client inside the patched scope, or you pass the mock URL explicitly through the base_url argument. Both are clean. The mistake to avoid is patching at module level and assuming a module-scoped client picked it up.

What a mocked request actually looks like

Nothing on the wire changes. The SDK still sends POST /v1/messages with x-api-key and anthropic-version headers, still expects a Messages API JSON body or an SSE stream. The mock just has to answer that. The host changes, the shape does not.

anthropic Python SDK against a local mock

The host on line 2 is whatever you set in base_url or ANTHROPIC_BASE_URL. Mock that host and the SDK never knows the difference. The auth header is still sent even though it is meaningless, so your stub server can ignore it or assert on it depending on what you want to test.

Four ways to mock that actually work

You can stop at the cheapest one that answers the question your tests are asking. They are not interchangeable.

Prism mock server

What the anthropic-sdk-python repo itself uses for its test suite. Runs the OpenAPI spec on localhost:4010 and returns schema-valid responses without any of your own code. Closest fidelity to the real API. Best when you want to check that your request shape passes the spec.

fakellm

Single-process Python mock from PyPI. import fakellm, set ANTHROPIC_BASE_URL to its local port, get deterministic canned responses. Lightest setup. Best for unit tests where you do not care about the response content.

AI-Mocks (mokksy.dev)

Kotlin-flavored mock with Anthropic-specific helpers and fluent assertions. Good when you want to assert on the exact request body the SDK builds. Heavier dependency.

Custom FastAPI stub

Thirty lines of FastAPI or aiohttp. Hand-write the responses, stream specific token sequences, simulate rate limits and 529s, replay exact failure modes. Best when off-the-shelf mocks do not give you the scenario you need.

The pytest pattern, end to end

Two snippets that fit in any conftest.py. The first one boots a mock server in a thread and exposes its URL through a fixture. The second one constructs a client against that fixture and runs an assertion. The pattern works with Prism, fakellm, or a hand-rolled stub by swapping out the boot step.

# conftest.py
import socket
import threading
import pytest
import uvicorn
from fastapi import FastAPI

def _free_port() -> int:
    with socket.socket() as s:
        s.bind(("127.0.0.1", 0))
        return s.getsockname()[1]

@pytest.fixture(scope="session")
def anthropic_mock_url():
    app = FastAPI()

    @app.post("/v1/messages")
    def messages():
        return {
            "id": "msg_test",
            "type": "message",
            "role": "assistant",
            "model": "claude-test",
            "content": [{"type": "text", "text": "ok"}],
            "stop_reason": "end_turn",
            "usage": {"input_tokens": 1, "output_tokens": 1},
        }

    port = _free_port()
    cfg = uvicorn.Config(app, host="127.0.0.1", port=port, log_level="warning")
    server = uvicorn.Server(cfg)
    thread = threading.Thread(target=server.run, daemon=True)
    thread.start()
    while not server.started:
        pass
    yield f"http://127.0.0.1:{port}"
    server.should_exit = True
    thread.join(timeout=2)

# test_my_app.py
from anthropic import Anthropic

def test_my_handler(anthropic_mock_url):
    client = Anthropic(api_key="test-key", base_url=anthropic_mock_url)
    msg = client.messages.create(
        model="claude-test",
        max_tokens=8,
        messages=[{"role": "user", "content": "hi"}],
    )
    assert msg.content[0].text == "ok"

Pass base_url explicitly so the test does not depend on whatever ANTHROPIC_BASE_URL happens to be set in CI. If you prefer the env-var form, use monkeypatch.setenv inside the test and construct a fresh client inside the patched block so the frozen-base-url rule from the resolution section does not bite you.

The footgun no Python tutorial mentions

The SDK does not validate base_url at construction time. It stores the string, hands it to httpx on the first request, and httpx is the layer that decides whether the string is a real URL. The values that bite:

localhost:8000 (no scheme). httpx treats this as a relative URL with localhost as the scheme and 8000 as the path, then fails when it tries to send to it.
127.0.0.1:8000 (no scheme). Same problem. Looks valid because that is exactly how every README writes a port, but the SDK needs the scheme.
http:// with no host. Sometimes ends up in env files because of a botched template substitution. Scheme present, host empty, httpx rejects.
Trailing whitespace. A copy-paste from a wiki ends with a newline, httpx is strict about it. Always trim before forwarding a user-provided value.

Anchor fact (from production code)

Fazm wraps Claude Code via ACP and exposes a Custom API Endpoint setting. The bridge that spawns the agent forwards the user value as ANTHROPIC_BASE_URL only when it survives a validation guard. The guard lives in Desktop/Sources/Chat/ACPBridge.swift around line 654. In plain terms it does this: trim whitespace, parse the value with the standard URL parser, require an http or https scheme, require a non-empty host. Any other shape gets logged and the bridge falls back to the default Anthropic endpoint so chat keeps working.

That guard exists because exactly one user-supplied value of localhost:8766 (no scheme) shipped through it once and silently bricked every chat turn. The SDK accepted the string at construction. The first request threw API Error: Invalid URL. The retry-with-resume path swallowed the error into an empty assistant turn, so the user saw a chat that responded with nothing and no visible error. We added the guard, never trusted a base URL string again.

The Python equivalent of that guard is twelve lines. If you read a base URL from a settings file or an env var, validate it once at startup rather than at the first request:

from urllib.parse import urlparse

def safe_base_url(value: str | None) -> str | None:
    if not value:
        return None
    value = value.strip()
    parsed = urlparse(value)
    if parsed.scheme not in ("http", "https"):
        return None
    if not parsed.netloc:
        return None
    return value

# usage: pass None to let the SDK fall back to its built-in default
base_url = safe_base_url(os.environ.get("ANTHROPIC_BASE_URL"))
client = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"],
                   base_url=base_url)

0 -> 100%

“Empty assistant turns dropped to zero across the next release after we added a scheme-and-host check before forwarding the user value into ANTHROPIC_BASE_URL.”

Fazm internal regression on the Custom API Endpoint code path, 2026-04

When to mock vs when to use the real API on a cheap model

Mocks are right for everything that exercises your code. Mocks are wrong for anything that exercises the model. If a test asserts that your prompt produces a particular shape of output, a mock that returns canned text will pass even when the prompt is broken. For that class of test, point the real SDK at the real API with a small model like claude-haiku-4-5-20251001, set a tight token cap, and run the test sparingly. The boundary is whether the test is about your code or the model. If it is about your code, mock. If it is about the model, do not.

The other case where the real endpoint earns its keep is streaming. The SSE protocol the SDK consumes is non-trivial, and a hand-rolled stub that misses one event type will hang the .text_stream iterator with no obvious error. Prism is good at this because it generates the stream from the spec. If you do roll your own, write a 5-line script that calls the real API once, captures the raw SSE, and replays it from a file. Replay is more honest than fabrication.

One env var, every SDK

ANTHROPIC_BASE_URL is read by the Python SDK, the TypeScript SDK, Claude Code, and any well-behaved wrapper that follows the same convention. Setting it in your shell before launching anything reroutes the whole ecosystem to the same mock host. That is convenient and dangerous. Convenient because one env in a .envrc covers everything you run during local dev. Dangerous because a forgotten export keeps biting until you notice that a real script is hitting a stale port.

The pattern that survives a year of use: scope the env var to the directory with direnv or set it explicitly per test run via monkeypatch, never put it in your login shell. The same rule applies to Claude Code when you use it from the same machine: a global export will quietly redirect it too. Read more on that resolution lifecycle in Claude Code and ANTHROPIC_BASE_URL.

Wrapping a Claude or Anthropic SDK into your own product?

If you are building on top of the Anthropic Python SDK and want to compare notes on validation guards, custom endpoint UX, or mock harnesses for CI, book a call.

Frequently asked questions

Does the official anthropic Python SDK read ANTHROPIC_BASE_URL by default?

Yes. The Anthropic client constructor accepts a base_url keyword argument. If you do not pass it, the SDK reads the ANTHROPIC_BASE_URL environment variable. If that is also unset, it falls back to the built-in api.anthropic.com endpoint. The same resolution order applies to AsyncAnthropic. You can verify by reading src/anthropic/_client.py in the anthropic-sdk-python repo, which calls os.environ.get("ANTHROPIC_BASE_URL") when the explicit argument is None.

What is the simplest way to mock the Anthropic SDK for local development?

Construct the client with base_url pointing at a local server and a stub key, like Anthropic(api_key="test-key", base_url="http://127.0.0.1:8000"). Run any HTTP server on that port that returns a valid Messages API response shape. The request format, the x-api-key header, the streaming SSE protocol all stay the same, so your application code does not change between real and mocked runs.

Why does my client work in code but the SDK throws Invalid URL on the first call?

Because base_url accepts a string at construction time but is not parsed until the first request. A value like localhost:8000 looks like a URL but has no scheme, so httpx (the underlying client) rejects it when it tries to build the request. The error message is unhelpful unless you read it: it says Invalid URL, not invalid ANTHROPIC_BASE_URL. The fix is to always include the scheme: http:// or https://. Validate the value before you hand it to the SDK if it comes from an env var or a settings file.

Can I use the same env var for both the Python SDK and Claude Code at the same time?

Yes, ANTHROPIC_BASE_URL is read by every official Anthropic SDK and by Claude Code. Setting it in your shell points all of them at the same endpoint until you unset it. The catch is that they read it once at process start, so a running Claude Code session keeps the old value even if you export a new one. The Python SDK reads it on every client construction, so a new client picks up the change, but a long-lived client created earlier still holds the original.

Should I use Prism, AI-Mocks, fakellm, or a custom stub server?

Pick based on what you are testing. Prism (from Stoplight) replays the OpenAPI spec the anthropic-sdk-python test suite uses on localhost:4010, which is the closest you can get to the real schema without hitting the API. AI-Mocks (mokksy.dev) gives you Kotlin-style fluent assertions and is great for integration tests. fakellm is a single-process Python mock with deterministic responses, the lightest option. A 30-line FastAPI stub is the right choice when you want to script error cases or stream specific token sequences that the off-the-shelf mocks do not let you scenario.

How do I write a pytest fixture that mocks the Anthropic client?

Two clean patterns. First, monkeypatch the env var and start a real local server in a fixture: monkeypatch.setenv("ANTHROPIC_BASE_URL", f"http://127.0.0.1:{port}"), then your client construction in the test reads the patched value. Second, override the base_url directly: pass base_url=mock_url into the Anthropic constructor inside the test. The first style isolates the SDK code from the test code; the second style is explicit and survives if a wrapper around the SDK swallows the env var.

Does mocking work for streaming responses?

Yes, but the mock has to emit the Anthropic streaming protocol, not just JSON. The real API streams server-sent events with message_start, content_block_start, content_block_delta, content_block_stop, message_delta, and message_stop event types, framed as event: <name> followed by data: <json>. Prism handles this from the OpenAPI spec; AI-Mocks and fakellm have helpers for it; a custom FastAPI stub needs to use StreamingResponse and yield the SSE frames yourself. If your assertion is that the SDK returns a streamed text, your mock must stream, otherwise the .text_stream iterator hangs.

What is the safest way to expose a custom endpoint as a user setting?

Validate the URL before you forward it as ANTHROPIC_BASE_URL. The minimum guard is: parse it with the standard URL parser, require an absolute URL, require an http or https scheme, and require a non-empty host. We ship that exact guard in Fazm because a single user-provided value of localhost:8766 (no scheme) silently bricked every chat turn in a release. The SDK accepted the string, then threw Invalid URL on every send, and the agent retry loop swallowed the error into an empty assistant turn. The user saw nothing. The guard is in Desktop/Sources/Chat/ACPBridge.swift and falls back to the default Anthropic endpoint with a log line when the user value is malformed.