Compare AI Models, Make Better Decisions
Benchmarks, pricing, and capabilities for every major AI model — all in one place. Stop guessing which model fits your use case.
200+
AI Models Tracked
13+
Benchmarks Compared
49+
Providers Covered
156+
Open Source Models
Gemini 3.1 Pro
Google's most capable model. 94.3% on GPQA Diamond, 80.6% on SWE-bench, 77.1% on ARC-AGI-2. #1 on 12 of 18 tracked benchmarks.
Input
$2.00/M
Output
$12.00/M
Context
1.0M
Claude Sonnet 4.6
Anthropic
Matches Opus 4.6 on most benchmarks at 1/5 the cost. 79.6% on SWE-bench, 1M context, computer use, and design capabilities.
Input
$3.00/M
Output
$15.00/M
Context
1.0M
DeepSeek V4
DeepSeek
DeepSeek's 1T parameter coding-focused model with 1M+ context. Three architectural innovations: Manifold-Constrained Hyper-Connections, Engram memory, Sparse Attention.
Input
$0.10/M
Output
$0.40/M
Context
1.0M
Grok 4.20
xAI
xAI's 4-agent parallel collaboration system with rapid learning architecture and medical document analysis. Beta release.
Input
$3.00/M
Output
$15.00/M
Context
131K
Key Insights
Live rankings, pricing data, and performance metrics updated continuously
DeepSeek V4
DeepSeek
DeepSeek's 1T parameter coding-focused model with 1M+ context. Three architectural innovations: Manifold-Constrained Hyper-Connections, Engram memory, Sparse Attention.
Input
$0.10/M
Output
$0.40/M
Context
1.0M
Tiny Aya
Cohere
Cohere's compact multilingual model supporting 70+ languages. Runs on consumer devices including phones. Outperforms Gemma3-4B in 46/61 languages.
Input
$0.01/M
Output
$0.01/M
Context
32K
Qwen3.5 397B
Alibaba/Qwen
Alibaba's open-weight hybrid MoE model with 512 experts and 17B active parameters. Natively multimodal with 201 language support. Top scores on GPQA and SWE-bench.
Input
$0.15/M
Output
$1.00/M
Context
256K
MiniMax M2.5
MiniMax
Achieves 80.2% on SWE-Bench Verified matching Opus 4.6 at 1/20th cost. First on Multi-SWE-Bench at 51.3%.
Input
$0.25/M
Output
$0.75/M
Context
128K
Data Privacy
Keep data on-premise
Customizable
Fine-tune on your data
No Lock-in
Switch providers freely
Cost Control
Predictable self-host costs
Everything You Need to Choose
Comprehensive tools for evaluating, comparing, and selecting the right AI model
Benchmark Analysis
13+ benchmarks per model with interactive radar charts, category breakdowns, and head-to-head scoring.
Pricing & Cost Calculator
Real-time pricing comparison across every model. Built-in calculators for estimating daily and monthly costs.
Full Model Profiles
Detailed pages for each model with specs, use cases, API examples, benchmark breakdowns, and related models.
Comparing Token Costs: What Does AI Actually Cost to Use?
A practical breakdown of what tokens mean in real terms — from a single email to processing an entire codebase.
Gemini 3.1 Pro: Google Claims #1 on 12 of 18 Benchmarks
Google's Gemini 3.1 Pro achieves 94.3% on GPQA Diamond and 77.1% on ARC-AGI-2, more than doubling its predecessor's reasoning score.
SWE-bench Leaderboard: February 2026 Rankings
The latest SWE-bench Verified scores show Kimi K2.5 and Qwen3.5 tied near the top. Here is the full leaderboard breakdown.
Stay Ahead of AI
Get weekly updates on new model releases, pricing changes, and benchmark results delivered to your inbox. Join thousands of developers and teams who rely on GPTCrunch.
No spam. Unsubscribe anytime.