ChatGPT vs Claude API cost, May 2026

The headline rates are a wash. The meter you actually pay is somewhere else.

At the comparable tiers, OpenAI and Anthropic charge within cents of each other per million tokens. The bill that actually shows up at month end is dominated by workload shape, not vendor choice: cached-input ratio, output verbosity, screenshot tool outputs, and auto-compact churn on long sessions. Here is the verified May 2026 table, the session-level math behind it, and the wrapper pattern that bypasses the per-token meter entirely for coding workflows.

Matthew Diakonov, Written with AI

Published May 16, 202610 min read

Direct answer (verified 2026-05-16)

Per 1M tokens, headline rates are: GPT-5.5 $5 in / $30 out, Claude Opus 4.7 $5 in / $25 out. GPT-5.4 $2.50 in / $15 out, Claude Sonnet 4.6 $3 in / $15 out. GPT-5.4-mini $0.75 in / $4.50 out, Claude Haiku 4.5 $1 in / $5 out. OpenAI also ships GPT-5.4-nano at $0.20 / $1.25 with no Anthropic equivalent. Both vendors discount cached input to 10% of the base rate and offer 50% off via the Batch API. Sources: Anthropic and OpenAI.

The price table, side by side

All numbers are per 1M tokens, US dollars, standard (non-batch) endpoints. Cached input is the per-token rate the vendor charges when the same prefix has already been processed within the cache window. Both vendors document this at the 10% multiplier, which is why the cached column is roughly one tenth of the input column on each row.

Tier	OpenAI model	Input	Cached	Output	Anthropic model	Input	Cached	Output
Top tier (reasoning, hard tasks)	GPT-5.5	$5.00	$0.50	$30.00	Claude Opus 4.7	$5.00	$0.50	$25.00
Workhorse (most production traffic)	GPT-5.4	$2.50	$0.25	$15.00	Claude Sonnet 4.6	$3.00	$0.30	$15.00
Cheap workhorse	GPT-5.4-mini	$0.75	$0.075	$4.50	Claude Haiku 4.5	$1.00	$0.10	$5.00
Ultra-cheap small	GPT-5.4-nano	$0.20	$0.02	$1.25	(no Anthropic equivalent at this rate)	n/a	n/a	n/a
Batch (50% off, async)	GPT-5.4 batch	$1.25	$0.13	$7.50	Sonnet 4.6 batch	$1.50	$0.15	$7.50

Verified against Anthropic's public pricing page (platform.claude.com/docs/en/about-claude/pricing) and OpenAI's developer pricing page (developers.openai.com/api/docs/pricing) on 2026-05-16.

The headline difference, in dollars

A workhorse session (50k input + 15k output tokens, no caching) on Claude Sonnet 4.6 costs $0.375. The same workload on GPT-5.4 costs $0.350. The difference is $0.00 cents per session. Run a thousand of those sessions and you have saved a coffee.

At the top tier, Opus 4.7 is actually cheaper than GPT-5.5 on output ($25 vs $30/MTok). For a long-form generation workload where output dominates, Anthropic wins by 17% on the headline rate. For a heavy-tool-call workload where input dominates, OpenAI wins by 17% on input ($2.50 vs $3 at the workhorse tier). Neither matters if your cache hit rate is wrong.

0.9x

“Both APIs price a cache hit at 10% of the base input rate. A workload that lands 80% of its input on the cache pays roughly 28% of the headline input rate. The vendor's per-MTok number on the marketing page is the worst case, not the typical case. If your bill is at the worst case, your client is the bug.”

Verified at platform.claude.com/docs/en/about-claude/pricing and developers.openai.com/api/docs/pricing

Where the money actually leaks

The headline tier is a 5% number on a real bill. The remaining 95% is workload shape. Six places to look before you blame the vendor.

Cached input ratio

Both vendors charge 10% of the input rate on a cache hit. On a coding agent that reads the same files turn after turn, cache hit rate is usually 60-80% of input tokens. If your client misses caches, you pay 10x more for the same context. The headline difference between $2.50 and $3 is rounding error against a missed cache.

Output verbosity

Output is 5-10x more expensive than input on both APIs. A model that writes a four-paragraph preamble before every code block costs more per session than a model that just emits the diff. This is a prompt engineering bill, not a vendor bill.

Screenshot tool outputs

Any agent that drives a computer through screenshots pays for them. A 1920x1200 PNG is on the order of 350,000 image tokens once tokenized. Ten turns of screenshots is 3.5M tokens of input bloat before the model thinks. Accessibility tree of the same window is ~1,500 text tokens. The OS API choice dwarfs the OpenAI-vs-Anthropic choice.

Auto-compact churn

Claude Code's default Compact behavior summarizes context once you hit a threshold, then re-uploads the summary on the next turn. The summary writes new cache entries, evicting the old ones. Long sessions can double their input bill from compact-induced cache churn alone. (Fazm's wrapper disables this by default.)

Tool definition tokens

Every API call that registers tools pays a fixed system prompt overhead. Claude adds 346 tokens per request for tool_choice=auto; OpenAI bills the function schema as input. A 20-tool agent burns 5-10k input tokens per turn just to remind the model what it can call.

Reasoning tokens

OpenAI's reasoning models (o-series, GPT-5.5-Pro) generate hidden chain-of-thought billed as output. Anthropic's Opus does similar work but the thinking tokens are visible and charged the same. Comparing the two at face-value output rate underestimates OpenAI reasoning bills by 2-5x depending on problem hardness.

A single agent session, billed turn by turn

Track the same coding session against the meter. The numbers below assume Sonnet 4.6, but the GPT-5.4 column is within a few cents and follows the same curve.

0 turns. The system prompt and tool definitions land

Roughly 3,000-6,000 input tokens before the user has typed anything. On Claude Sonnet 4.6 that is $0.009-$0.018; on GPT-5.4 it is $0.0075-$0.015. Less than two cents either way.

5 turns. The agent has read 12 files and run 3 tool calls

Input has grown to ~50,000 tokens, mostly file contents and tool results. If your client is caching correctly, ~40,000 of those are cache hits at 10% of base rate. Sonnet bill: $0.030 input + a few cents output. GPT-5.4 bill: $0.025 input + a few cents output. Spread: half a cent.

30 turns. The session is past the 200k context threshold

Claude Code triggers auto-compact: summarize history, drop it, re-upload summary. The summary write evicts the cached prefix. Next turn's 'cache hit' becomes a cache miss, and you pay full input rate for 100k+ tokens. This is the line item that turns a $0.10 session into a $5 session, and it is the same whether the model is OpenAI or Anthropic.

Mac restart. The session window closes

Vanilla Claude Code or Codex CLI: the cache is process-local, and a restart loses it. You re-pay full input rate on the first turn after restart. Fazm: chats survive a Mac restart because acp-bridge persists transcripts and resumes the SDK with the same session UUID. Cache state is rebuilt without re-billing.

The wrapper pattern that bypasses the meter

For AI coding specifically, there is a third option neither pricing page mentions. The official Claude Code CLI from Anthropic ships under the @agentclientprotocol/claude-agent-acp package. That package supports two billing modes: an API key (which hits the per-MTok meter) and an OAuth login to a Claude.ai Pro or Max subscription (which counts usage against your flat subscription session window and does not bill per token).

Fazm depends on @agentclientprotocol/claude-agent-acp 0.29.2, pinned at line 16 of acp-bridge/package.json in the open source repo. The same wrapper also bundles codex-acp and a standalone OAuth flow at acp-bridge/src/codex-oauth-flow.ts that lets a ChatGPT subscription back the agent loop the same way. The relevant comment in that file calls it out directly: “Standalone OAuth flow for Codex (ChatGPT subscription) authentication.” For a developer who already pays Anthropic $20/month for Claude Pro or OpenAI $20/month for ChatGPT Plus, that is the cheapest correct answer to this entire post: zero per-token cost on top of a flat sub.

The same wrapper still accepts a custom Anthropic-compatible base URL, so you can point it at a per-token API key, a corporate proxy, GitHub Copilot, or a local model behind a bridge. The agent loop does not care; the wiring is at the model boundary. If your workload genuinely benefits from per-token billing (high TPM, short bursty calls), that path is open. The point is that the binary choice between “OpenAI API” and “Anthropic API” is not the only choice you have.

“I used to watch my Anthropic console like a hawk during a long coding session. Switched to running through the Pro subscription via the ACP wrapper and stopped checking. Same Claude Code, no per-token bleeding.”

Adam K.

Staff engineer

When the per-token API is still the right call

Subscription-backed coding is not the universal answer. A few cases where per-token billing wins decisively. One, anything non-interactive: bulk classification, embeddings prep, long batch jobs. The Batch API at 50% off makes per-token cheaper than any subscription seat could be. Two, high-volume production traffic that needs a custom rate-limit tier. Subscription session windows are not designed for thousands of RPM. Three, applications that need a non-Claude, non-GPT model in the same loop. The raw API is the lowest common denominator across the stack.

For an individual developer or a small team doing interactive AI coding, the math almost always tips toward subscription. For a platform serving inference to thousands of users, the math almost always tips toward per-token. The honest comparison is not OpenAI vs Anthropic. It is the right billing mode for your workload, then the right model for your problem inside that mode.

Want a 20-minute teardown of your actual AI coding bill?

Bring last month’s OpenAI or Anthropic console screenshot. We will pull apart where the line items go and whether the wrapper pattern would zero it out for your workflow.

Frequently asked questions

What does ChatGPT API actually cost vs Claude API per million tokens?

Verified on the OpenAI and Anthropic pricing pages on 2026-05-16: GPT-5.5 is $5 input / $30 output / $0.50 cached. Claude Opus 4.7 is $5 input / $25 output / $0.50 cached. GPT-5.4 is $2.50 / $15 / $0.25. Claude Sonnet 4.6 is $3 / $15 / $0.30. GPT-5.4-mini is $0.75 / $4.50 / $0.075. Claude Haiku 4.5 is $1 / $5 / $0.10. GPT-5.4-nano sits below Anthropic's cheapest tier at $0.20 / $1.25. The two vendors are within cents on the comparable tiers; only the ultra-cheap nano model has no Anthropic equivalent.

Which is cheaper for an AI coding assistant: ChatGPT API or Claude API?

At face value, GPT-5.4 is $0.50/MTok cheaper on input and ties on output. On a typical coding session (50k input, 15k output), the difference comes out to about $0.025. Both APIs offer prompt caching at 10% of the input rate, and Claude has explicit 5-minute and 1-hour cache write tiers. In practice the cost is dominated by your cache hit ratio, your auto-compact behavior, and your tool-output discipline, not by which API logo is on the invoice.

Are GPT-5.5 reasoning costs comparable to Claude Opus thinking costs?

Roughly. GPT-5.5-Pro is $30 / $180 vs Opus 4.7 at $5 / $25. The catch is reasoning tokens: GPT-5.x reasoning models bill hidden chain-of-thought as output, so a single hard problem can spend 5-50k reasoning tokens you do not see in the response. Opus 4.7's extended thinking is visible to the caller but billed at the same $25/MTok output rate. Per-completion costs on a hard task converge surprisingly close.

Does the Batch API change the answer?

Both vendors offer a Batch API at 50% off input and output. GPT-5.4 batch is $1.25 / $7.50. Claude Sonnet 4.6 batch is $1.50 / $7.50. Batch is asynchronous (results within 24 hours), so it is only useful for bulk classification, embeddings prep, eval runs, or content generation jobs. Interactive agents cannot use it. If you are running a nightly job that processes a queue, batch is the right answer regardless of vendor.

What about the data-residency uplift?

Anthropic charges 1.1x on regional or US-only inference for Claude 4.5 and later. OpenAI charges 1.1x on regional processing endpoints. Identical multiplier on both sides, applied independently of vendor choice. The number is in the small print on both pricing pages.

If both APIs cost about the same, why does my Claude bill look bigger?

Three common reasons. One, prompt caching is misconfigured and you are paying full input rate on context that should hit the cache (look for cache_read_input_tokens in your usage telemetry, if it is near zero, caching is broken). Two, you are running long sessions with auto-compact on, and the summary writes are evicting the prefix cache every threshold-cross. Three, your client is screenshotting Mac apps and feeding the PNGs back, which puts 300k+ image tokens through the input column every turn. Switch to an accessibility-tree-based tool and the line drops by an order of magnitude.

How does Fazm change the cost math?

Fazm wraps Claude Code (via @agentclientprotocol/claude-agent-acp 0.29.2, pinned in acp-bridge/package.json) and Codex (via OAuth-linked ChatGPT subscription, see acp-bridge/src/codex-oauth-flow.ts). For coding, that means the agent loop runs against your existing Claude Pro or Claude Max subscription, or your ChatGPT subscription. There is no per-token bill against your card; usage counts against your flat subscription session window. If the wrapper fits your use case, ChatGPT-vs-Claude API cost stops being your problem.

Can Fazm use the raw API instead of a subscription?

Yes. Fazm accepts a custom Anthropic-compatible base URL, so you can route to the Anthropic API directly, to GitHub Copilot's Anthropic-compatible endpoint, or to any corporate proxy that speaks the same protocol. The model boundary is pluggable. If you want to pay per-token instead of per-subscription, the wiring is the same.

Where do I verify everything in this page?

Anthropic's official pricing page at platform.claude.com/docs/en/about-claude/pricing has the full per-MTok table including cache write tiers. OpenAI's developer pricing page at developers.openai.com/api/docs/pricing has the GPT-5 family rates. Fazm's claim of subscription-based execution traces to acp-bridge/package.json line 16 (@agentclientprotocol/claude-agent-acp 0.29.2) and acp-bridge/src/codex-oauth-flow.ts in github.com/m13v/fazm.

What about rate limits, not just price?

Both APIs gate by tier (Tier 1 through Tier 4 plus Enterprise on Anthropic; usage tiers on OpenAI). At low tiers, Claude tends to hit RPM ceilings sooner on long-context workloads; OpenAI tends to hit TPM ceilings sooner on bursty calls. Real production traffic almost always negotiates custom rate limits, which is a separate conversation from the listed per-token rate.

Where AI coding cost actually comes from

Keep reading

Guide

AI coding tools: API access vs subscription plans compared

Why API access is more predictable than subscription tools that quietly tighten limits after lock-in.

Read

Guide

Managing Claude Code API costs: caching, sessions, monitoring

Practical guide to the line items that actually drive AI coding cost: cache misses, session length, and tool-call bloat.

Read

Alternative

Accessibility tree vs screenshots for AI agents

Why named AX fields cost ~1,500 tokens against ~350,000 for a 1920x1200 screenshot. The token math nobody publishes.

Read

The price table, side by side

The headline difference, in dollars

Where the money actually leaks

Cached input ratio

Output verbosity

Screenshot tool outputs

Auto-compact churn

Tool definition tokens

Reasoning tokens

A single agent session, billed turn by turn

0 turns. The system prompt and tool definitions land

5 turns. The agent has read 12 files and run 3 tool calls

30 turns. The session is past the 200k context threshold

Mac restart. The session window closes

The wrapper pattern that bypasses the meter

When the per-token API is still the right call

Want a 20-minute teardown of your actual AI coding bill?

Frequently asked questions

Keep reading

AI coding tools: API access vs subscription plans compared

Managing Claude Code API costs: caching, sessions, monitoring

Accessibility tree vs screenshots for AI agents

Comments (••)

Comments ()