GPTCrunch

Compare Models

Select up to 4 models to compare benchmarks, pricing, and capabilities side by side.

OpenAI logoo3-mini

OpenAI

Anthropic logoClaude Sonnet 4

Anthropic

DeepSeek logoDeepSeek-R1

DeepSeek

Add Model
MMLU
o3-mini
86.9
Claude Sonnet 4
88.7
DeepSeek-R1
90.8
HumanEval
o3-mini
92.9
Claude Sonnet 4
93.7
DeepSeek-R1
92.8
GSM8K
o3-mini
97.9
Claude Sonnet 4
96.4
DeepSeek-R1
97.3
GPQA
o3-mini
77.0
Claude Sonnet 4
68.2
DeepSeek-R1
71.5
MGSM
o3-mini
89.5
Claude Sonnet 4
91.6
DeepSeek-R1
92.8
ARC-Challenge
o3-mini
96.0
Claude Sonnet 4
96.7
DeepSeek-R1
97.2
HellaSwag
o3-mini
92.5
Claude Sonnet 4
93.2
DeepSeek-R1
93.8
MATH
o3-mini
97.0
Claude Sonnet 4
78.0
DeepSeek-R1
97.3
SWE-bench
o3-mini
49.3
Claude Sonnet 4
53.6
DeepSeek-R1
49.2
MMMLU
o3-mini
83.5
Claude Sonnet 4
86.0
DeepSeek-R1
87.5
ModelInputOutputBlended*
o3-mini
$1.10$4.40$2.75
Claude Sonnet 4
$3.00$15.00$9.00
DeepSeek-R1
$0.55$2.19$1.37

*Blended = average of input and output price

Spec
o3-mini
Claude Sonnet 4
DeepSeek-R1
Context Window200K200K128K
Max Output100K16K8K
TTFT800ms280ms900ms
Speed75 tok/s100 tok/s60 tok/s
ParametersN/AN/A685B (37B active)
ArchitectureTransformer + CoTTransformerTransformer (MoE) + CoT
Open SourceNoNoYes
Tiermidmidmid

Quick Verdict

Best Performance

DeepSeek-R1

Best Value

DeepSeek-R1

Fastest

Claude Sonnet 4