by Google· 11 months ago
Google's most capable thinking model with breakthrough performance on reasoning and coding.
Context Window
1.0M
Max Output
66K
TTFT
600ms
Speed
85 tok/s
Input Price
$1.25/M tokens
Output Price
$10.00/M tokens
Performance Profile
Frontier-tier performance at $1.25/M input tokens
Massive 1.0M token context window for entire codebases and long documents
Supports text + image + audio + video + code — true multimodal capability
Consistently scores 80%+ across major benchmarks
Analyze and reason about video content with native multimodal capabilities.
Solve multi-step problems with built-in thinking and chain-of-thought.
Top-tier coding with excellent understanding of complex systems.
vs similar-tier models
| Model | Input | Output | Context | Avg Score |
|---|---|---|---|---|
Gemini 2.5 ProCurrent | $1.25 | $10.00 | 1.0M | 88.4 |
GPT-4o OpenAI | $2.50 | $10.00 | 128K | 81.1 |
Kimi K2.5 Moonshot AI | $0.45 | $2.20 | 256K | 92.3 |
Describe a single image
$0.0033Photo → detailed description
1,000 in · 200 out
Analyze a chart or diagram
$0.0075Visual data → structured insights
2,000 in · 500 out
OCR a 10-page document
$0.049Scanned pages → structured text
15,000 in · 3,000 out
Batch process 100 images
$0.325Bulk image analysis pipeline
100,000 in · 20,000 out
Image descriptions
$98/mo
$3/day
Document OCR
$1463/mo
$49/day
Batch image analysis
$9750/mo
$325/day
No ratings yet. Be the first to rate this model!
Sign in to rate this model and share your experience.
Sign in to leave a comment and join the discussion.
Google's fastest multimodal model with native tool use and advanced agentic capabilities.
Input
$0.10/M
Output
$0.40/M
Context
1.0M
Google's fast and cost-efficient thinking model with strong reasoning capabilities.
Input
$0.15/M
Output
$0.60/M
Context
1.0M
Google DeepMind's flagship video generation model that natively produces joint audio-visual output in a single pass. Veo 3 leverages a Latent Diffusion Transformer to generate high-fidelity clips with synchronized dialogue, sound effects, and ambient audio without requiring a separate audio model. It demonstrates strong physical understanding and prompt adherence across diverse cinematic styles.
Input
$5.00/M
Output
$150.00/M
OpenAI
OpenAI's most advanced multimodal model. Excels at text, vision, and audio tasks with fast response times.
Input
$2.50/M
Output
$10.00/M
Context
128K
Moonshot AI
Moonshot AI's frontier multimodal MoE model with 1T total parameters (32B active). Tops SWE-bench and AIME 2025 benchmarks.
Input
$0.45/M
Output
$2.20/M
Context
256K
OpenAI
OpenAI's reasoning model with chain-of-thought capabilities for complex problem solving.
Input
$15.00/M
Output
$60.00/M
Context
200K