by Allen AI· 9 months ago
Open multimodal model for visual understanding, image captioning, and visual question answering.
Context Window
128K
Input Price
$0.40/M tokens
Output Price
$1.20/M tokens
Performance Profile
Frontier-tier performance at $0.40/M input tokens
128K token context window — handles lengthy documents with ease
Supports text + image — true multimodal capability
Fully open source — self-host, fine-tune, and customize without restrictions
vs similar-tier models
| Model | Input | Output | Context | Avg Score |
|---|---|---|---|---|
Molmo 72BCurrent Allen AI | $0.40 | $1.20 | 128K | 75.4 |
GPT-4o OpenAI | $2.50 | $10.00 | 128K | 81.1 |
Kimi K2.5 Moonshot AI | $0.45 | $2.20 | 256K | 92.3 |
Describe a single image
<$0.001Photo → detailed description
1,000 in · 200 out
Analyze a chart or diagram
$0.0014Visual data → structured insights
2,000 in · 500 out
OCR a 10-page document
$0.0096Scanned pages → structured text
15,000 in · 3,000 out
Batch process 100 images
$0.064Bulk image analysis pipeline
100,000 in · 20,000 out
Image descriptions
$19/mo
$0.64/day
Document OCR
$288/mo
$10/day
Batch image analysis
$1920/mo
$64/day
No ratings yet. Be the first to rate this model!
Sign in to rate this model and share your experience.
Sign in to leave a comment and join the discussion.
Allen AI
Fully open-source model from Allen AI with open training data, code, and weights.
Input
$0.04/M
Output
$0.04/M
Context
4K
Allen AI
Fully open model with all components public: data, code, weights, and checkpoints. Instruct, Think, and RL Zero variants.
Input
$0.25/M
Output
$0.75/M
Context
128K
Allen AI
Outperforms Llama 3.1 8B. Everything released: training data, weights, code, recipes, and checkpoints.
Input
$0.07/M
Output
$0.14/M
Context
128K
OpenAI
OpenAI's most advanced multimodal model. Excels at text, vision, and audio tasks with fast response times.
Input
$2.50/M
Output
$10.00/M
Context
128K
Moonshot AI
Moonshot AI's frontier multimodal MoE model with 1T total parameters (32B active). Tops SWE-bench and AIME 2025 benchmarks.
Input
$0.45/M
Output
$2.20/M
Context
256K
Google's most capable thinking model with breakthrough performance on reasoning and coding.
Input
$1.25/M
Output
$10.00/M
Context
1.0M