Compare Models
Select up to 4 models to compare benchmarks, pricing, and capabilities side by side.
OpenAI
DeepSeek
Alibaba/Qwen
Add Model
MMLU
o3-mini
86.9
DeepSeek-R1
90.8
Qwen2.5-Coder 32B
0.0
HumanEval
o3-mini
92.9
DeepSeek-R1
92.8
Qwen2.5-Coder 32B
92.7
GSM8K
o3-mini
97.9
DeepSeek-R1
97.3
Qwen2.5-Coder 32B
88.0
GPQA
o3-mini
77.0
DeepSeek-R1
71.5
Qwen2.5-Coder 32B
0.0
MGSM
o3-mini
89.5
DeepSeek-R1
92.8
Qwen2.5-Coder 32B
0.0
ARC-Challenge
o3-mini
96.0
DeepSeek-R1
97.2
Qwen2.5-Coder 32B
0.0
HellaSwag
o3-mini
92.5
DeepSeek-R1
93.8
Qwen2.5-Coder 32B
0.0
MATH
o3-mini
97.0
DeepSeek-R1
97.3
Qwen2.5-Coder 32B
0.0
SWE-bench
o3-mini
49.3
DeepSeek-R1
49.2
Qwen2.5-Coder 32B
35.0
MMMLU
o3-mini
83.5
DeepSeek-R1
87.5
Qwen2.5-Coder 32B
0.0
LiveCodeBench
o3-mini
0.0
DeepSeek-R1
0.0
Qwen2.5-Coder 32B
52.0
| Model | Input | Output | Blended* |
|---|---|---|---|
o3-mini | $1.10 | $4.40 | $2.75 |
DeepSeek-R1 | $0.55 | $2.19 | $1.37 |
Qwen2.5-Coder 32B | $0.08 | $0.08 | $0.08 |
*Blended = average of input and output price
| Spec | o3-mini | DeepSeek-R1 | Qwen2.5-Coder 32B |
|---|---|---|---|
| Context Window | 200K | 128K | 128K |
| Max Output | 100K | 8K | 8K |
| TTFT | 800ms | 900ms | 180ms |
| Speed | 75 tok/s | 60 tok/s | 120 tok/s |
| Parameters | N/A | 685B (37B active) | 32B |
| Architecture | Transformer + CoT | Transformer (MoE) + CoT | Transformer |
| Open Source | No | Yes | Yes |
| Tier | mid | mid | mid |
Quick Verdict
Best Performance
DeepSeek-R1
Best Value
Qwen2.5-Coder 32B
Fastest
Qwen2.5-Coder 32B