I keep researching this for my own work, so I turned it into a reference. One big table, everything in it: prices, benchmarks, licenses, where to access each model. Scan it, pick what you need, go build. Quick mental model before the table: **Agent = Model + Harness**. The model is the intelligence, the harness (Claude Code, Cursor, Cline, Aider...) turns it into something that navigates your repo and fixes its own mistakes. Pick both. ## The table | Model | Creator | License | Price in/out per 1M | SWE-Bench | Direct access | Also on | |---|---|---|---|---|---|---| | Claude Fable 5 | Anthropic | Closed | $10 / $50 | 80.3% Pro | [platform.claude.com](https://platform.claude.com) | Claude Code, claude.ai | | Claude Opus 4.8 | Anthropic | Closed | $5 / $25 | 69.2% Pro, 88.6% Verified | [platform.claude.com](https://platform.claude.com) | Claude Code, Cursor, [OpenRouter](https://openrouter.ai/models), Bedrock, Vertex | | Claude Sonnet 5 | Anthropic | Closed | ~$3 / $15 | new, near Opus 4.8 | [platform.claude.com](https://platform.claude.com) | Claude Code, Cursor | | Claude Sonnet 4.6 | Anthropic | Closed | $3 / $15 | strong | [platform.claude.com](https://platform.claude.com) | Cursor, Windsurf, Cline, OpenRouter | | Claude Haiku 4.5 | Anthropic | Closed | $1 / $5 | 73.3% Verified | [platform.claude.com](https://platform.claude.com) | OpenRouter | | GPT-5.5 | OpenAI | Closed | $5 / $30 | 58.6% Pro, 88.7% Verified | [platform.openai.com](https://platform.openai.com) | Codex CLI, Copilot, Cursor, Azure | | GPT-5.3-Codex | OpenAI | Closed | $1.75 / $14 | coding-tuned | [platform.openai.com](https://platform.openai.com) | Codex CLI | | Gemini 3.1 Pro | Google | Closed | $2 / $12 ($4 / $18 above 200K) | 54.2% Pro, 80.6% Verified | [aistudio.google.com](https://aistudio.google.com) | Gemini CLI, Vertex, Cursor, OpenRouter | | Grok 4.x | xAI | Closed | ~$2.60 / $7.80 | Tier B | [x.ai](https://x.ai) | Cursor, OpenRouter | | DeepSeek V4-Pro | DeepSeek | MIT | $1.74 / $3.48 | 80.6% Verified | [platform.deepseek.com](https://platform.deepseek.com) | OpenRouter, Together, Morph, [HuggingFace](https://huggingface.co/deepseek-ai) | | DeepSeek V4-Flash | DeepSeek | MIT | $0.14 / $0.28 | 79% Verified | [platform.deepseek.com](https://platform.deepseek.com) | OpenRouter (:free tier), Morph, Ollama | | DeepSeek V3.2 | DeepSeek | MIT | ~$0.23 / ~$1 | best value classic | [platform.deepseek.com](https://platform.deepseek.com) | OpenRouter, Ollama | | GLM 5.2 | Z.ai | Open | $1.40 / $4.40 (~$0.45 / $3.31 avg on OR) | 62.1% Pro (vendor) | [z.ai](https://z.ai) | OpenRouter, [HuggingFace](https://huggingface.co/zai-org), GLM Coding Plan | | GLM-4.7-Flash | Z.ai | Open | free | decent | [z.ai](https://z.ai) | OpenRouter, Ollama | | Kimi K2.6 | Moonshot | Open | $0.95 / $4.00 | 80.2% Verified, 58.6% Pro | [platform.moonshot.ai](https://platform.moonshot.ai) | OpenRouter, Together, Groq, [HuggingFace](https://huggingface.co/moonshotai) | | Kimi K2.7 Code | Moonshot | Apache 2.0 | ~K2.6 rates, 30% fewer thinking tokens | coding-tuned | [platform.moonshot.ai](https://platform.moonshot.ai) | OpenRouter, HuggingFace | | MiniMax M3 | MiniMax | Open | $0.60 / $2.40 (~$0.10 / $1.21 avg on OR) | 80.5% Verified | [minimax.io](https://www.minimax.io) | OpenRouter, Atlas Cloud | | MiniMax M2.7 | MiniMax | Open | $0.30 / $1.20 | 56.2% Pro | [minimax.io](https://www.minimax.io) | Together, Atlas Cloud | | Qwen3.7-Max | Alibaba | Closed API | via Alibaba Cloud | 80%+ Verified | [alibabacloud.com](https://www.alibabacloud.com) | OpenRouter | | Qwen 3.6-27B | Alibaba | Apache 2.0 | free local (22GB VRAM) | 77.2% Verified | [Ollama](https://ollama.com/library), [HuggingFace](https://huggingface.co/Qwen) | Together, OpenRouter | | Qwen 3.6 Plus | Alibaba | Open | $0.50 / $3.00 | 61.6% Terminal-Bench | Alibaba Cloud | Together, OpenRouter | | Qwen3 Coder | Alibaba | Apache 2.0 | free on OpenRouter | best free coding model | [OpenRouter :free](https://openrouter.ai/models) | Ollama | | Qwen 2.5 Coder 32B | Alibaba | Apache 2.0 | free local (18GB+ VRAM) | best local classic | [Ollama](https://ollama.com/library/qwen2.5-coder) | LM Studio, Continue.dev | | Nemotron 3 Ultra | NVIDIA | OpenMDW | ~$0.42 / $2.61 (OR avg) | #2 open on AA index | [NVIDIA NIM](https://build.nvidia.com) | OpenRouter (:free route), HuggingFace | | North Mini Code | Cohere | Apache 2.0 | free local | 33.4 AA Coding Index | [HuggingFace](https://huggingface.co/CohereLabs) | Ollama, vLLM | | Codestral 22B | Mistral | Open | free local (12GB VRAM) | best autocomplete local | [Ollama](https://ollama.com/library/codestral) | [Mistral API](https://console.mistral.ai), OpenRouter | | Devstral Small 24B | Mistral | Open | free local (16GB VRAM) | near-frontier local | [Ollama](https://ollama.com/library/devstral) | Mistral API | | Llama 4 Scout | Meta | Open | free local, :free on OR | solid, 10M context | [HuggingFace](https://huggingface.co/meta-llama) | Ollama, Groq, Together | | Poolside Laguna M.1 | Poolside | Closed | free tier | #1 on Kilo usage | [Kilo Code](https://kilo.ai) | Poolside platform | | openPangu 2.0 | Huawei | Open | free local | competitive | [HuggingFace](https://huggingface.co) | vLLM self-host | **How to read it:** - Prices are first-party rates as of late June 2026. OR = OpenRouter weighted average, noted where it differs a lot from direct. - OpenRouter adds ~5.5% credit fee ($0.80 minimum) and 5% BYOK fee above 1M requests/month. The per-token rates themselves match provider list prices. - Benchmarks mix vendor-reported and independent numbers. Compare within the same column only, and test on your own code before committing. - Fable 5 caveat: a US export-control directive suspended access on June 12, expected back for US users around July 1. If you are outside the US (like me), verify before building on it. ## The agents (harnesses) | Agent | Type | Models | Link | |---|---|---|---| | Claude Code | First-party CLI/app | Claude family | [claude.com/claude-code](https://claude.com/claude-code) | | Codex CLI | First-party CLI | GPT family | [openai.com/codex](https://openai.com/codex) | | Gemini CLI | First-party CLI | Gemini | [github.com/google-gemini/gemini-cli](https://github.com/google-gemini/gemini-cli) | | Copilot Agent Mode | IDE-native | GPT family | [github.com/features/copilot](https://github.com/features/copilot) | | Cursor | AI IDE | BYOM (Claude, GPT, Gemini...) | [cursor.com](https://cursor.com) | | Windsurf | AI IDE | Multiple | [windsurf.com](https://windsurf.com) | | Cline | VS Code ext, BYOM | Anything via OpenRouter/Ollama | [cline.bot](https://cline.bot) | | RooCode | VS Code ext, BYOM | Anything, strong on large multi-file work | [roocode.com](https://roocode.com) | | Aider | Terminal, BYOM | Anything, git-native | [aider.chat](https://aider.chat) | | Kilo Code | BYOM, free tiers | Anything, live usage leaderboard | [kilo.ai](https://kilo.ai) | | Continue.dev | VS Code ext, local | Ollama models | [continue.dev](https://continue.dev) | ## Providers cheat sheet | Provider | What it is | When to use | |---|---|---| | First-party APIs | Direct from each creator | Cheapest per token on one provider | | [OpenRouter](https://openrouter.ai) | 315+ models, one key | Multi-model access, budget +5-7% overhead | | [Together AI](https://together.ai) / [Fireworks](https://fireworks.ai) | Neutral open-model hosts | Open models with fine-tuning, dedicated deploys | | [Groq](https://groq.com) | Fast inference | Speed on open models | | [Morph](https://morphllm.com) | bf16, no quantization | Open-model fidelity for codegen (most hosts quantize to fp8 and lose quality) | | Bedrock / Vertex / Azure | Cloud resellers | 10-20% more per token, but compliance and one cloud bill | | [Ollama](https://ollama.com) | Local runner | Free, private, offline | | Subscriptions | Claude Pro/Max, Codex Plus, GLM Coding Plan, OpenCode Go ($10/mo) | Can beat API billing depending on your usage profile | ## Three numbers to remember **The 10x cliff.** Five models score between 80.2% and 80.6% on SWE-bench Verified (DeepSeek V4 Pro, Gemini 3.1 Pro, MiniMax M3, Qwen3.7 Max, Kimi K2.6), with output prices from $2.40 to $12 per million. The next 8 points up (GPT-5.5, Opus 4.8) cost $25 to $30. The last 8% of quality costs 10x. Know if your work lives in that gap. **The 21.7 point gap.** Claude Fable 5 scores 80.3% on SWE-Bench Pro vs 58.6% for GPT-5.5. Frontier benchmarks usually move in single digits. This one did not. **1/36th.** DeepSeek V4 Flash delivers about 82% of Opus's SWE-Bench Pro score at roughly 1/36th the input price. It does not replace Opus. It means Opus should stop doing DeepSeek's job. ## My routing shortcut - Cheap and low-risk work: DeepSeek V4 Flash - Default coding workhorse: Kimi K2.6 or Sonnet-class - Hard architecture, repeated failures: frontier (Opus 4.8, GPT-5.5, Fable 5 if you can get it) - Private or offline: Qwen local via Ollama - Always: measure cost per successful task, not cost per token. A cheap model that causes rework is expensive. This space moves monthly. I will update this table when the next shakeup lands. Building with agents and want to compare notes? I am @itseduvieira pretty much everywhere.