I keep researching this for my own work, so I turned it into a reference. One big table, everything in it: prices, benchmarks, licenses, where to access each model. Scan it, pick what you need, go build.

Quick mental model before the table: **Agent = Model + Harness**. The model is the intelligence, the harness (Claude Code, Cursor, Cline, Aider...) turns it into something that navigates your repo and fixes its own mistakes. Pick both.

## The table

| Model | Creator | License | Price in/out per 1M | SWE-Bench | Direct access | Also on |
|---|---|---|---|---|---|---|
| Claude Fable 5 | Anthropic | Closed | $10 / $50 | 80.3% Pro | [platform.claude.com](https://platform.claude.com) | Claude Code, claude.ai |
| Claude Opus 4.8 | Anthropic | Closed | $5 / $25 | 69.2% Pro, 88.6% Verified | [platform.claude.com](https://platform.claude.com) | Claude Code, Cursor, [OpenRouter](https://openrouter.ai/models), Bedrock, Vertex |
| Claude Sonnet 5 | Anthropic | Closed | ~$3 / $15 | new, near Opus 4.8 | [platform.claude.com](https://platform.claude.com) | Claude Code, Cursor |
| Claude Sonnet 4.6 | Anthropic | Closed | $3 / $15 | strong | [platform.claude.com](https://platform.claude.com) | Cursor, Windsurf, Cline, OpenRouter |
| Claude Haiku 4.5 | Anthropic | Closed | $1 / $5 | 73.3% Verified | [platform.claude.com](https://platform.claude.com) | OpenRouter |
| GPT-5.5 | OpenAI | Closed | $5 / $30 | 58.6% Pro, 88.7% Verified | [platform.openai.com](https://platform.openai.com) | Codex CLI, Copilot, Cursor, Azure |
| GPT-5.3-Codex | OpenAI | Closed | $1.75 / $14 | coding-tuned | [platform.openai.com](https://platform.openai.com) | Codex CLI |
| Gemini 3.1 Pro | Google | Closed | $2 / $12 ($4 / $18 above 200K) | 54.2% Pro, 80.6% Verified | [aistudio.google.com](https://aistudio.google.com) | Gemini CLI, Vertex, Cursor, OpenRouter |
| Grok 4.x | xAI | Closed | ~$2.60 / $7.80 | Tier B | [x.ai](https://x.ai) | Cursor, OpenRouter |
| DeepSeek V4-Pro | DeepSeek | MIT | $1.74 / $3.48 | 80.6% Verified | [platform.deepseek.com](https://platform.deepseek.com) | OpenRouter, Together, Morph, [HuggingFace](https://huggingface.co/deepseek-ai) |
| DeepSeek V4-Flash | DeepSeek | MIT | $0.14 / $0.28 | 79% Verified | [platform.deepseek.com](https://platform.deepseek.com) | OpenRouter (:free tier), Morph, Ollama |
| DeepSeek V3.2 | DeepSeek | MIT | ~$0.23 / ~$1 | best value classic | [platform.deepseek.com](https://platform.deepseek.com) | OpenRouter, Ollama |
| GLM 5.2 | Z.ai | Open | $1.40 / $4.40 (~$0.45 / $3.31 avg on OR) | 62.1% Pro (vendor) | [z.ai](https://z.ai) | OpenRouter, [HuggingFace](https://huggingface.co/zai-org), GLM Coding Plan |
| GLM-4.7-Flash | Z.ai | Open | free | decent | [z.ai](https://z.ai) | OpenRouter, Ollama |
| Kimi K2.6 | Moonshot | Open | $0.95 / $4.00 | 80.2% Verified, 58.6% Pro | [platform.moonshot.ai](https://platform.moonshot.ai) | OpenRouter, Together, Groq, [HuggingFace](https://huggingface.co/moonshotai) |
| Kimi K2.7 Code | Moonshot | Apache 2.0 | ~K2.6 rates, 30% fewer thinking tokens | coding-tuned | [platform.moonshot.ai](https://platform.moonshot.ai) | OpenRouter, HuggingFace |
| MiniMax M3 | MiniMax | Open | $0.60 / $2.40 (~$0.10 / $1.21 avg on OR) | 80.5% Verified | [minimax.io](https://www.minimax.io) | OpenRouter, Atlas Cloud |
| MiniMax M2.7 | MiniMax | Open | $0.30 / $1.20 | 56.2% Pro | [minimax.io](https://www.minimax.io) | Together, Atlas Cloud |
| Qwen3.7-Max | Alibaba | Closed API | via Alibaba Cloud | 80%+ Verified | [alibabacloud.com](https://www.alibabacloud.com) | OpenRouter |
| Qwen 3.6-27B | Alibaba | Apache 2.0 | free local (22GB VRAM) | 77.2% Verified | [Ollama](https://ollama.com/library), [HuggingFace](https://huggingface.co/Qwen) | Together, OpenRouter |
| Qwen 3.6 Plus | Alibaba | Open | $0.50 / $3.00 | 61.6% Terminal-Bench | Alibaba Cloud | Together, OpenRouter |
| Qwen3 Coder | Alibaba | Apache 2.0 | free on OpenRouter | best free coding model | [OpenRouter :free](https://openrouter.ai/models) | Ollama |
| Qwen 2.5 Coder 32B | Alibaba | Apache 2.0 | free local (18GB+ VRAM) | best local classic | [Ollama](https://ollama.com/library/qwen2.5-coder) | LM Studio, Continue.dev |
| Nemotron 3 Ultra | NVIDIA | OpenMDW | ~$0.42 / $2.61 (OR avg) | #2 open on AA index | [NVIDIA NIM](https://build.nvidia.com) | OpenRouter (:free route), HuggingFace |
| North Mini Code | Cohere | Apache 2.0 | free local | 33.4 AA Coding Index | [HuggingFace](https://huggingface.co/CohereLabs) | Ollama, vLLM |
| Codestral 22B | Mistral | Open | free local (12GB VRAM) | best autocomplete local | [Ollama](https://ollama.com/library/codestral) | [Mistral API](https://console.mistral.ai), OpenRouter |
| Devstral Small 24B | Mistral | Open | free local (16GB VRAM) | near-frontier local | [Ollama](https://ollama.com/library/devstral) | Mistral API |
| Llama 4 Scout | Meta | Open | free local, :free on OR | solid, 10M context | [HuggingFace](https://huggingface.co/meta-llama) | Ollama, Groq, Together |
| Poolside Laguna M.1 | Poolside | Closed | free tier | #1 on Kilo usage | [Kilo Code](https://kilo.ai) | Poolside platform |
| openPangu 2.0 | Huawei | Open | free local | competitive | [HuggingFace](https://huggingface.co) | vLLM self-host |

**How to read it:**

- Prices are first-party rates as of late June 2026. OR = OpenRouter weighted average, noted where it differs a lot from direct.
- OpenRouter adds ~5.5% credit fee ($0.80 minimum) and 5% BYOK fee above 1M requests/month. The per-token rates themselves match provider list prices.
- Benchmarks mix vendor-reported and independent numbers. Compare within the same column only, and test on your own code before committing.
- Fable 5 caveat: a US export-control directive suspended access on June 12, expected back for US users around July 1. If you are outside the US (like me), verify before building on it.

## The agents (harnesses)

| Agent | Type | Models | Link |
|---|---|---|---|
| Claude Code | First-party CLI/app | Claude family | [claude.com/claude-code](https://claude.com/claude-code) |
| Codex CLI | First-party CLI | GPT family | [openai.com/codex](https://openai.com/codex) |
| Gemini CLI | First-party CLI | Gemini | [github.com/google-gemini/gemini-cli](https://github.com/google-gemini/gemini-cli) |
| Copilot Agent Mode | IDE-native | GPT family | [github.com/features/copilot](https://github.com/features/copilot) |
| Cursor | AI IDE | BYOM (Claude, GPT, Gemini...) | [cursor.com](https://cursor.com) |
| Windsurf | AI IDE | Multiple | [windsurf.com](https://windsurf.com) |
| Cline | VS Code ext, BYOM | Anything via OpenRouter/Ollama | [cline.bot](https://cline.bot) |
| RooCode | VS Code ext, BYOM | Anything, strong on large multi-file work | [roocode.com](https://roocode.com) |
| Aider | Terminal, BYOM | Anything, git-native | [aider.chat](https://aider.chat) |
| Kilo Code | BYOM, free tiers | Anything, live usage leaderboard | [kilo.ai](https://kilo.ai) |
| Continue.dev | VS Code ext, local | Ollama models | [continue.dev](https://continue.dev) |

## Providers cheat sheet

| Provider | What it is | When to use |
|---|---|---|
| First-party APIs | Direct from each creator | Cheapest per token on one provider |
| [OpenRouter](https://openrouter.ai) | 315+ models, one key | Multi-model access, budget +5-7% overhead |
| [Together AI](https://together.ai) / [Fireworks](https://fireworks.ai) | Neutral open-model hosts | Open models with fine-tuning, dedicated deploys |
| [Groq](https://groq.com) | Fast inference | Speed on open models |
| [Morph](https://morphllm.com) | bf16, no quantization | Open-model fidelity for codegen (most hosts quantize to fp8 and lose quality) |
| Bedrock / Vertex / Azure | Cloud resellers | 10-20% more per token, but compliance and one cloud bill |
| [Ollama](https://ollama.com) | Local runner | Free, private, offline |
| Subscriptions | Claude Pro/Max, Codex Plus, GLM Coding Plan, OpenCode Go ($10/mo) | Can beat API billing depending on your usage profile |

## Three numbers to remember

**The 10x cliff.** Five models score between 80.2% and 80.6% on SWE-bench Verified (DeepSeek V4 Pro, Gemini 3.1 Pro, MiniMax M3, Qwen3.7 Max, Kimi K2.6), with output prices from $2.40 to $12 per million. The next 8 points up (GPT-5.5, Opus 4.8) cost $25 to $30. The last 8% of quality costs 10x. Know if your work lives in that gap.

**The 21.7 point gap.** Claude Fable 5 scores 80.3% on SWE-Bench Pro vs 58.6% for GPT-5.5. Frontier benchmarks usually move in single digits. This one did not.

**1/36th.** DeepSeek V4 Flash delivers about 82% of Opus's SWE-Bench Pro score at roughly 1/36th the input price. It does not replace Opus. It means Opus should stop doing DeepSeek's job.

## My routing shortcut

- Cheap and low-risk work: DeepSeek V4 Flash
- Default coding workhorse: Kimi K2.6 or Sonnet-class
- Hard architecture, repeated failures: frontier (Opus 4.8, GPT-5.5, Fable 5 if you can get it)
- Private or offline: Qwen local via Ollama
- Always: measure cost per successful task, not cost per token. A cheap model that causes rework is expensive.

This space moves monthly. I will update this table when the next shakeup lands. Building with agents and want to compare notes? I am @itseduvieira pretty much everywhere.