AI coding in 2026 costs between $12 and $2.300/mo. That's a 183x difference.

DeepSeek costs $12.60/mo for the same token volume that would cost $400 on Opus 4.6. That's a 30x difference for about 8 percentage points on SWE-Bench.

I spent the last couple of days putting together a full comparison of AI model pricing for coding, API vs subscriptions, open-source vs proprietary, and the results surprised me a bit more than I expected.

This whole thing is a mess

There are two ways to use AI models for coding: through apps like ChatGPT, Claude, or Gemini with a monthly subscription, or through raw API calls where you pay per token.

Subscriptions are simple. You pay $20 to $250 per month and you get access to the model with usage limits. The app handles everything for you.

API pricing is where it gets complicated. You pay per token, but not all tokens cost the same. Input tokens have one price, output tokens have another. Then there's cached tokens, batch processing discounts, and context window surcharges for long conversations. Every provider structures this differently.

Anthropic charges $5 input / $25 output per million tokens for Opus 4.6, but over 200K context it jumps to $10 / $37.50. Google charges $2 / $12 for Gemini 3 Pro, but over 200K context it becomes $4 / $18. OpenAI charges $1.75 / $14 for GPT-5.2, and their reasoning tokens count as output so complex tasks cost more than you'd expect.

Just reading that paragraph back, you can see the problem. This stuff is genuinely hard to compare.

To make it fair I used three usage tiers (tokens per month):

Light - 5M input / 1.5M output
Average - 30M input / 10M output
Heavy - 150M input / 50M output

Quick glossary

AI models charge by tokens, not by words. A token is roughly 3/4 of a word, so 1,000 tokens is about 750 words.

When you use an AI model through API, you pay separately for what you send (input tokens) and what it sends back (output tokens). Output is always more expensive, usually 3x to 8x more.

The context window is how much text the model can "see" at once. A 200K context window means it can process about 150,000 words in a single conversation. Larger windows cost more.

Cache is when the model reuses parts of previous requests instead of processing them from scratch. Providers offer discounts of up to 90% on cached tokens, which is why caching matters so much for cost optimization.

Conversation history is every message you've sent and received in a chat. Apps like ChatGPT and Claude manage this for you, compressing and trimming it to save costs. On raw API, you send the full history yourself with every request, and you pay for all of it.

That's basically the entire pricing vocabulary. Now let's look at how much it actually costs.

The numbers

API costs at average usage (30M input, 10M output tokens/mo):

Model	Monthly Cost
DeepSeek V3.2	$12.60
MiniMax M2.5	$21.00
GLM-5	$34.50
Kimi K2.5	$36.00
Gemini 3 Pro	$180.00
GPT-5.2	$192.50
Opus 4.6	$400.00
GPT-5.2 Pro	$2,310.00

There's a clear split here. The open-source models cluster between $12 and $36. The proprietary ones jump to $180 and above. There's basically nothing in between.

At heavy usage the spread gets even wider. DeepSeek stays at $63/mo. Opus jumps to $2,000/mo. GPT-5.2 Pro hits $11,550/mo. The difference between cheapest and most expensive is 183x.

Now the question is whether you're actually getting 183x better code.

SWE-Bench Verified scores, which measure actual coding performance:

Model	SWE-Bench Score
Opus 4.6	80.9%
MiniMax M2.5	80.2%
GPT-5.2	80.0%
Gemini 3 Pro	76.2%
GLM-5	~75%
DeepSeek V3.2	73.1%

You're not. MiniMax scores 80.2% at $1.20 per million output tokens. Opus scores 80.9% at $25.00. Almost the same coding ability, 20x the price.

If you calculate cost per 1% of SWE-Bench score, DeepSeek comes in at $0.0057 per point while Opus sits at $0.3091. That's 54x more expensive per unit of coding performance.

That doesn't mean the premium models are a bad deal though. Each one has specific strengths beyond benchmark scores. Gemini 3 Pro has a 1M token context window, 5x larger than Opus, which matters when you're working with large codebases. Kimi K2.5 can run 1,500 parallel tool calls for agentic workflows. MiniMax has a Lightning variant running at around 100 tokens per second. These things don't show up in SWE-Bench.

Where subscriptions flip the math

Claude Pro, ChatGPT Plus, and Google AI Pro all cost $20/mo. At light usage, Opus through API would cost $62.50/mo, GPT-5.2 would cost $29.75, and Gemini 3 Pro $28. So if you're a solo dev doing a few coding sessions a day, a $20 subscription is actually the cheapest way to access premium models.

Google also offers AI Ultra at $249.99/mo with Deep Think mode. OpenAI has Pro at $200/mo with full GPT-5.2 Pro compute. Anthropic has Max tiers at $100 and $200 for heavier workloads. On the other end, DeepSeek offers unlimited free chat through their web app.

At heavy usage the subscription math flips completely. DeepSeek API at $63/mo beats every subscription tier for raw volume. But if you specifically need Opus-level quality at high volume, Anthropic's $200/mo Max plan is cheaper than the $2,000/mo you'd pay through API. So the right answer depends on which model you actually need and how much you use it.

The apps are fooling you (for good and for bad)

This is the part that makes comparing subscription vs API tricky, and honestly the part I think is most underestimated.

When you use ChatGPT or Claude through their apps, they're not just forwarding your message to the model. They handle context management so the conversation stays coherent without sending the entire history every time. They cache prompts so repeated patterns don't cost full price. They compress conversation history to fit more into the context window. And they sometimes route simpler parts of your request to cheaper models.

All of this is invisible to you, but it's a big part of what you're paying for with a subscription.

On the API side these optimizations exist but you have to build them yourself. Google's caches give 90% discounts on repeated reads. OpenAI and Anthropic both offer 50% batch discounts and 90% cache discounts. DeepSeek has automatic caching that brings input costs down to $0.028 per million tokens.

A well-optimized pipeline with caching and batching can cut proprietary API costs by 50-90%. But building and maintaining that pipeline takes real engineering time. If you don't do it well, you end up paying full price for every token and might spend more than the subscription would have cost you.

This is basically the core tradeoff. Subscriptions trade money for convenience. API trades engineering time for cost control. And open-source models let you push that even further by self-hosting on your own infrastructure.

The verdict

The interesting thing about these numbers is that there's no single right answer. It depends on what you're optimizing for.

If you care about raw cost per token, DeepSeek and MiniMax are hard to beat. A solo developer or small team can run near-frontier coding models for $20-50/mo through API and get 73-80% SWE-Bench performance. That was unthinkable a year ago.

If you care about convenience and don't want to build infrastructure, a $20/mo subscription to Claude, ChatGPT, or Gemini gives you premium model access with all the optimization built in. For light to moderate use this is genuinely the best deal.

If you're running serious volume, the smart move is probably a mix. Use premium models for the hard problems where that extra 5-8% of coding performance matters, and route everything else to open-source APIs. With proper caching on the premium side, you can keep costs reasonable even at scale.

And if you care about data sovereignty, every open-source model on this list (DeepSeek, MiniMax, GLM-5, Kimi) can be self-hosted. That's a real option now, not a compromise.

The pricing gap between open-source and proprietary in 2026 isn't really about quality anymore. It's about what you're willing to build yourself, and what you'd rather pay someone else to handle.

Want to check the numbers yourself? I linked the full spreadsheet with all pricing and benchmark data HERE.

Sources

API Pricing (Official)

Anthropic: https://platform.claude.com/docs/en/about-claude/pricing
OpenAI: https://openai.com/api/pricing/
Google Gemini: https://ai.google.dev/gemini-api/docs/pricing
DeepSeek: https://api-docs.deepseek.com/quick_start/pricing-details-usd
MiniMax M2.5 (OpenRouter): https://openrouter.ai/minimax/minimax-m2.5
GLM / Zhipu AI (OpenRouter): https://openrouter.ai/minimax
Kimi K2.5 (OpenRouter): https://openrouter.ai/moonshotai/kimi-k2.5

Subscription Plans

Google AI Pro / Ultra: https://one.google.com/about/ai-premium
OpenAI ChatGPT: https://openai.com/chatgpt/pricing/
Anthropic Claude: https://claude.ai/pricing

SWE-Bench Verified Scores

Official leaderboard: https://www.swebench.com/
Leaderboard with scores (Feb 2026): https://www.marc0.dev/en/leaderboard
Simon Willison's analysis (Feb 19, 2026): https://simonwillison.net/2026/Feb/19/swe-bench/
vals.ai independent runs: https://www.vals.ai/benchmarks/swebench

Pricing Comparison Tools

pricepertoken.com: https://pricepertoken.com
OpenRouter model list: https://openrouter.ai/models