I keep researching this for my own work, so I turned it into a reference. One big table, everything in it: prices, benchmarks, licenses, where to access each model. Scan it, pick what you need, go build.
Quick mental model before the table: Agent = Model + Harness. The model is the intelligence, the harness (Claude Code, Cursor, Cline, Aider...) turns it into something that navigates your repo and fixes its own mistakes. Pick both.
The table
| Model | Creator | License | Price in/out per 1M | SWE-Bench | Direct access | Also on |
|---|---|---|---|---|---|---|
| Claude Fable 5 | Anthropic | Closed | $10 / $50 | 80.3% Pro | platform.claude.com | Claude Code, claude.ai |
| Claude Opus 4.8 | Anthropic | Closed | $5 / $25 | 69.2% Pro, 88.6% Verified | platform.claude.com | Claude Code, Cursor, OpenRouter, Bedrock, Vertex |
| Claude Sonnet 5 | Anthropic | Closed | ~$3 / $15 | new, near Opus 4.8 | platform.claude.com | Claude Code, Cursor |
| Claude Sonnet 4.6 | Anthropic | Closed | $3 / $15 | strong | platform.claude.com | Cursor, Windsurf, Cline, OpenRouter |
| Claude Haiku 4.5 | Anthropic | Closed | $1 / $5 | 73.3% Verified | platform.claude.com | OpenRouter |
| GPT-5.5 | OpenAI | Closed | $5 / $30 | 58.6% Pro, 88.7% Verified | platform.openai.com | Codex CLI, Copilot, Cursor, Azure |
| GPT-5.3-Codex | OpenAI | Closed | $1.75 / $14 | coding-tuned | platform.openai.com | Codex CLI |
| Gemini 3.1 Pro | Closed | $2 / $12 ($4 / $18 above 200K) | 54.2% Pro, 80.6% Verified | aistudio.google.com | Gemini CLI, Vertex, Cursor, OpenRouter | |
| Grok 4.x | xAI | Closed | ~$2.60 / $7.80 | Tier B | x.ai | Cursor, OpenRouter |
| DeepSeek V4-Pro | DeepSeek | MIT | $1.74 / $3.48 | 80.6% Verified | platform.deepseek.com | OpenRouter, Together, Morph, HuggingFace |
| DeepSeek V4-Flash | DeepSeek | MIT | $0.14 / $0.28 | 79% Verified | platform.deepseek.com | OpenRouter (:free tier), Morph, Ollama |
| DeepSeek V3.2 | DeepSeek | MIT | ~$0.23 / ~$1 | best value classic | platform.deepseek.com | OpenRouter, Ollama |
| GLM 5.2 | Z.ai | Open | $1.40 / $4.40 (~$0.45 / $3.31 avg on OR) | 62.1% Pro (vendor) | z.ai | OpenRouter, HuggingFace, GLM Coding Plan |
| GLM-4.7-Flash | Z.ai | Open | free | decent | z.ai | OpenRouter, Ollama |
| Kimi K2.6 | Moonshot | Open | $0.95 / $4.00 | 80.2% Verified, 58.6% Pro | platform.moonshot.ai | OpenRouter, Together, Groq, HuggingFace |
| Kimi K2.7 Code | Moonshot | Apache 2.0 | ~K2.6 rates, 30% fewer thinking tokens | coding-tuned | platform.moonshot.ai | OpenRouter, HuggingFace |
| MiniMax M3 | MiniMax | Open | $0.60 / $2.40 (~$0.10 / $1.21 avg on OR) | 80.5% Verified | minimax.io | OpenRouter, Atlas Cloud |
| MiniMax M2.7 | MiniMax | Open | $0.30 / $1.20 | 56.2% Pro | minimax.io | Together, Atlas Cloud |
| Qwen3.7-Max | Alibaba | Closed API | via Alibaba Cloud | 80%+ Verified | alibabacloud.com | OpenRouter |
| Qwen 3.6-27B | Alibaba | Apache 2.0 | free local (22GB VRAM) | 77.2% Verified | Ollama, HuggingFace | Together, OpenRouter |
| Qwen 3.6 Plus | Alibaba | Open | $0.50 / $3.00 | 61.6% Terminal-Bench | Alibaba Cloud | Together, OpenRouter |
| Qwen3 Coder | Alibaba | Apache 2.0 | free on OpenRouter | best free coding model | OpenRouter :free | Ollama |
| Qwen 2.5 Coder 32B | Alibaba | Apache 2.0 | free local (18GB+ VRAM) | best local classic | Ollama | LM Studio, Continue.dev |
| Nemotron 3 Ultra | NVIDIA | OpenMDW | ~$0.42 / $2.61 (OR avg) | #2 open on AA index | NVIDIA NIM | OpenRouter (:free route), HuggingFace |
| North Mini Code | Cohere | Apache 2.0 | free local | 33.4 AA Coding Index | HuggingFace | Ollama, vLLM |
| Codestral 22B | Mistral | Open | free local (12GB VRAM) | best autocomplete local | Ollama | Mistral API, OpenRouter |
| Devstral Small 24B | Mistral | Open | free local (16GB VRAM) | near-frontier local | Ollama | Mistral API |
| Llama 4 Scout | Meta | Open | free local, :free on OR | solid, 10M context | HuggingFace | Ollama, Groq, Together |
| Poolside Laguna M.1 | Poolside | Closed | free tier | #1 on Kilo usage | Kilo Code | Poolside platform |
| openPangu 2.0 | Huawei | Open | free local | competitive | HuggingFace | vLLM self-host |
How to read it:
- Prices are first-party rates as of late June 2026. OR = OpenRouter weighted average, noted where it differs a lot from direct.
- OpenRouter adds ~5.5% credit fee ($0.80 minimum) and 5% BYOK fee above 1M requests/month. The per-token rates themselves match provider list prices.
- Benchmarks mix vendor-reported and independent numbers. Compare within the same column only, and test on your own code before committing.
- Fable 5 caveat: a US export-control directive suspended access on June 12, expected back for US users around July 1. If you are outside the US (like me), verify before building on it.
The agents (harnesses)
| Agent | Type | Models | Link |
|---|---|---|---|
| Claude Code | First-party CLI/app | Claude family | claude.com/claude-code |
| Codex CLI | First-party CLI | GPT family | openai.com/codex |
| Gemini CLI | First-party CLI | Gemini | github.com/google-gemini/gemini-cli |
| Copilot Agent Mode | IDE-native | GPT family | github.com/features/copilot |
| Cursor | AI IDE | BYOM (Claude, GPT, Gemini...) | cursor.com |
| Windsurf | AI IDE | Multiple | windsurf.com |
| Cline | VS Code ext, BYOM | Anything via OpenRouter/Ollama | cline.bot |
| RooCode | VS Code ext, BYOM | Anything, strong on large multi-file work | roocode.com |
| Aider | Terminal, BYOM | Anything, git-native | aider.chat |
| Kilo Code | BYOM, free tiers | Anything, live usage leaderboard | kilo.ai |
| Continue.dev | VS Code ext, local | Ollama models | continue.dev |
Providers cheat sheet
| Provider | What it is | When to use |
|---|---|---|
| First-party APIs | Direct from each creator | Cheapest per token on one provider |
| OpenRouter | 315+ models, one key | Multi-model access, budget +5-7% overhead |
| Together AI / Fireworks | Neutral open-model hosts | Open models with fine-tuning, dedicated deploys |
| Groq | Fast inference | Speed on open models |
| Morph | bf16, no quantization | Open-model fidelity for codegen (most hosts quantize to fp8 and lose quality) |
| Bedrock / Vertex / Azure | Cloud resellers | 10-20% more per token, but compliance and one cloud bill |
| Ollama | Local runner | Free, private, offline |
| Subscriptions | Claude Pro/Max, Codex Plus, GLM Coding Plan, OpenCode Go ($10/mo) | Can beat API billing depending on your usage profile |
Three numbers to remember
The 10x cliff. Five models score between 80.2% and 80.6% on SWE-bench Verified (DeepSeek V4 Pro, Gemini 3.1 Pro, MiniMax M3, Qwen3.7 Max, Kimi K2.6), with output prices from $2.40 to $12 per million. The next 8 points up (GPT-5.5, Opus 4.8) cost $25 to $30. The last 8% of quality costs 10x. Know if your work lives in that gap.
The 21.7 point gap. Claude Fable 5 scores 80.3% on SWE-Bench Pro vs 58.6% for GPT-5.5. Frontier benchmarks usually move in single digits. This one did not.
1/36th. DeepSeek V4 Flash delivers about 82% of Opus's SWE-Bench Pro score at roughly 1/36th the input price. It does not replace Opus. It means Opus should stop doing DeepSeek's job.
My routing shortcut
- Cheap and low-risk work: DeepSeek V4 Flash
- Default coding workhorse: Kimi K2.6 or Sonnet-class
- Hard architecture, repeated failures: frontier (Opus 4.8, GPT-5.5, Fable 5 if you can get it)
- Private or offline: Qwen local via Ollama
- Always: measure cost per successful task, not cost per token. A cheap model that causes rework is expensive.
This space moves monthly. I will update this table when the next shakeup lands. Building with agents and want to compare notes? I am @itseduvieira pretty much everywhere.
![tuaregs[blog]](/_next/image/?url=%2Fimg%2Flogo-blog.png&w=640&q=75)