by Shanghai AI Lab· 1 years ago
Open-source vision-language model with strong image understanding capabilities.
Context Window
8K
Max Output
4K
TTFT
200ms
Speed
110 tok/s
Input Price
$0.08/M tokens
Output Price
$0.08/M tokens
Performance Profile
Strong mid-tier performance balancing capability and cost
Supports text + image — true multimodal capability
Fully open source — self-host, fine-tune, and customize without restrictions
26B parameter architecture for deep reasoning
vs similar-tier models
| Model | Input | Output | Context | Avg Score |
|---|---|---|---|---|
InternVL2 26BCurrent Shanghai AI Lab | $0.08 | $0.08 | 8K | 66.7 |
o3-mini OpenAI | $1.10 | $4.40 | 200K | 86.3 |
DeepSeek-R1 DeepSeek | $0.55 | $2.19 | 128K | 87.0 |
Describe a single image
<$0.001Photo → detailed description
1,000 in · 200 out
Analyze a chart or diagram
<$0.001Visual data → structured insights
2,000 in · 500 out
OCR a 10-page document
$0.0014Scanned pages → structured text
15,000 in · 3,000 out
Batch process 100 images
$0.0096Bulk image analysis pipeline
100,000 in · 20,000 out
Image descriptions
$3/mo
$0.10/day
Document OCR
$43/mo
$1/day
Batch image analysis
$288/mo
$10/day
No ratings yet. Be the first to rate this model!
Sign in to rate this model and share your experience.
Sign in to leave a comment and join the discussion.
Shanghai AI Lab
Open-source model with 1M context from Shanghai AI Lab. Strong coding and math skills.
Input
$0.06/M
Output
$0.06/M
Context
1.0M
Shanghai AI Lab
State-of-the-art open multimodal LLM scoring 72.2 on MMMU. New record among open MLLMs.
Input
$0.40/M
Output
$1.20/M
Context
128K
Shanghai AI Lab
Latest InternLM series model. Efficient for research and application development.
Input
$0.07/M
Output
$0.14/M
Context
128K
OpenAI
OpenAI's efficient reasoning model, optimized for speed while maintaining strong analytical capabilities.
Input
$1.10/M
Output
$4.40/M
Context
200K
DeepSeek
DeepSeek's reasoning model with transparent chain-of-thought. Open-source and highly competitive.
Input
$0.55/M
Output
$2.19/M
Context
128K
Anthropic
Anthropic's best balance of intelligence and speed. Excellent for production workloads.
Input
$3.00/M
Output
$15.00/M
Context
200K