I keep researching this for my own work, so I turned it into a reference. One big table, everything in it: prices, benchmarks, licenses, where to access each model. Scan it, pick what you need, go build.

Quick mental model before the table: Agent = Model + Harness. The model is the intelligence, the harness (Claude Code, Cursor, Cline, Aider...) turns it into something that navigates your repo and fixes its own mistakes. Pick both.

The table

Model Creator License Price in/out per 1M SWE-Bench Direct access Also on
Claude Fable 5 Anthropic Closed $10 / $50 80.3% Pro platform.claude.com Claude Code, claude.ai
Claude Opus 4.8 Anthropic Closed $5 / $25 69.2% Pro, 88.6% Verified platform.claude.com Claude Code, Cursor, OpenRouter, Bedrock, Vertex
Claude Sonnet 5 Anthropic Closed ~$3 / $15 new, near Opus 4.8 platform.claude.com Claude Code, Cursor
Claude Sonnet 4.6 Anthropic Closed $3 / $15 strong platform.claude.com Cursor, Windsurf, Cline, OpenRouter
Claude Haiku 4.5 Anthropic Closed $1 / $5 73.3% Verified platform.claude.com OpenRouter
GPT-5.5 OpenAI Closed $5 / $30 58.6% Pro, 88.7% Verified platform.openai.com Codex CLI, Copilot, Cursor, Azure
GPT-5.3-Codex OpenAI Closed $1.75 / $14 coding-tuned platform.openai.com Codex CLI
Gemini 3.1 Pro Google Closed $2 / $12 ($4 / $18 above 200K) 54.2% Pro, 80.6% Verified aistudio.google.com Gemini CLI, Vertex, Cursor, OpenRouter
Grok 4.x xAI Closed ~$2.60 / $7.80 Tier B x.ai Cursor, OpenRouter
DeepSeek V4-Pro DeepSeek MIT $1.74 / $3.48 80.6% Verified platform.deepseek.com OpenRouter, Together, Morph, HuggingFace
DeepSeek V4-Flash DeepSeek MIT $0.14 / $0.28 79% Verified platform.deepseek.com OpenRouter (:free tier), Morph, Ollama
DeepSeek V3.2 DeepSeek MIT ~$0.23 / ~$1 best value classic platform.deepseek.com OpenRouter, Ollama
GLM 5.2 Z.ai Open $1.40 / $4.40 (~$0.45 / $3.31 avg on OR) 62.1% Pro (vendor) z.ai OpenRouter, HuggingFace, GLM Coding Plan
GLM-4.7-Flash Z.ai Open free decent z.ai OpenRouter, Ollama
Kimi K2.6 Moonshot Open $0.95 / $4.00 80.2% Verified, 58.6% Pro platform.moonshot.ai OpenRouter, Together, Groq, HuggingFace
Kimi K2.7 Code Moonshot Apache 2.0 ~K2.6 rates, 30% fewer thinking tokens coding-tuned platform.moonshot.ai OpenRouter, HuggingFace
MiniMax M3 MiniMax Open $0.60 / $2.40 (~$0.10 / $1.21 avg on OR) 80.5% Verified minimax.io OpenRouter, Atlas Cloud
MiniMax M2.7 MiniMax Open $0.30 / $1.20 56.2% Pro minimax.io Together, Atlas Cloud
Qwen3.7-Max Alibaba Closed API via Alibaba Cloud 80%+ Verified alibabacloud.com OpenRouter
Qwen 3.6-27B Alibaba Apache 2.0 free local (22GB VRAM) 77.2% Verified Ollama, HuggingFace Together, OpenRouter
Qwen 3.6 Plus Alibaba Open $0.50 / $3.00 61.6% Terminal-Bench Alibaba Cloud Together, OpenRouter
Qwen3 Coder Alibaba Apache 2.0 free on OpenRouter best free coding model OpenRouter :free Ollama
Qwen 2.5 Coder 32B Alibaba Apache 2.0 free local (18GB+ VRAM) best local classic Ollama LM Studio, Continue.dev
Nemotron 3 Ultra NVIDIA OpenMDW ~$0.42 / $2.61 (OR avg) #2 open on AA index NVIDIA NIM OpenRouter (:free route), HuggingFace
North Mini Code Cohere Apache 2.0 free local 33.4 AA Coding Index HuggingFace Ollama, vLLM
Codestral 22B Mistral Open free local (12GB VRAM) best autocomplete local Ollama Mistral API, OpenRouter
Devstral Small 24B Mistral Open free local (16GB VRAM) near-frontier local Ollama Mistral API
Llama 4 Scout Meta Open free local, :free on OR solid, 10M context HuggingFace Ollama, Groq, Together
Poolside Laguna M.1 Poolside Closed free tier #1 on Kilo usage Kilo Code Poolside platform
openPangu 2.0 Huawei Open free local competitive HuggingFace vLLM self-host

How to read it:

  • Prices are first-party rates as of late June 2026. OR = OpenRouter weighted average, noted where it differs a lot from direct.
  • OpenRouter adds ~5.5% credit fee ($0.80 minimum) and 5% BYOK fee above 1M requests/month. The per-token rates themselves match provider list prices.
  • Benchmarks mix vendor-reported and independent numbers. Compare within the same column only, and test on your own code before committing.
  • Fable 5 caveat: a US export-control directive suspended access on June 12, expected back for US users around July 1. If you are outside the US (like me), verify before building on it.

The agents (harnesses)

Agent Type Models Link
Claude Code First-party CLI/app Claude family claude.com/claude-code
Codex CLI First-party CLI GPT family openai.com/codex
Gemini CLI First-party CLI Gemini github.com/google-gemini/gemini-cli
Copilot Agent Mode IDE-native GPT family github.com/features/copilot
Cursor AI IDE BYOM (Claude, GPT, Gemini...) cursor.com
Windsurf AI IDE Multiple windsurf.com
Cline VS Code ext, BYOM Anything via OpenRouter/Ollama cline.bot
RooCode VS Code ext, BYOM Anything, strong on large multi-file work roocode.com
Aider Terminal, BYOM Anything, git-native aider.chat
Kilo Code BYOM, free tiers Anything, live usage leaderboard kilo.ai
Continue.dev VS Code ext, local Ollama models continue.dev

Providers cheat sheet

Provider What it is When to use
First-party APIs Direct from each creator Cheapest per token on one provider
OpenRouter 315+ models, one key Multi-model access, budget +5-7% overhead
Together AI / Fireworks Neutral open-model hosts Open models with fine-tuning, dedicated deploys
Groq Fast inference Speed on open models
Morph bf16, no quantization Open-model fidelity for codegen (most hosts quantize to fp8 and lose quality)
Bedrock / Vertex / Azure Cloud resellers 10-20% more per token, but compliance and one cloud bill
Ollama Local runner Free, private, offline
Subscriptions Claude Pro/Max, Codex Plus, GLM Coding Plan, OpenCode Go ($10/mo) Can beat API billing depending on your usage profile

Three numbers to remember

The 10x cliff. Five models score between 80.2% and 80.6% on SWE-bench Verified (DeepSeek V4 Pro, Gemini 3.1 Pro, MiniMax M3, Qwen3.7 Max, Kimi K2.6), with output prices from $2.40 to $12 per million. The next 8 points up (GPT-5.5, Opus 4.8) cost $25 to $30. The last 8% of quality costs 10x. Know if your work lives in that gap.

The 21.7 point gap. Claude Fable 5 scores 80.3% on SWE-Bench Pro vs 58.6% for GPT-5.5. Frontier benchmarks usually move in single digits. This one did not.

1/36th. DeepSeek V4 Flash delivers about 82% of Opus's SWE-Bench Pro score at roughly 1/36th the input price. It does not replace Opus. It means Opus should stop doing DeepSeek's job.

My routing shortcut

  • Cheap and low-risk work: DeepSeek V4 Flash
  • Default coding workhorse: Kimi K2.6 or Sonnet-class
  • Hard architecture, repeated failures: frontier (Opus 4.8, GPT-5.5, Fable 5 if you can get it)
  • Private or offline: Qwen local via Ollama
  • Always: measure cost per successful task, not cost per token. A cheap model that causes rework is expensive.

This space moves monthly. I will update this table when the next shakeup lands. Building with agents and want to compare notes? I am @itseduvieira pretty much everywhere.