Name: Qwen2-VL 72B
Price: 0.4 USD
Author: Alibaba/Qwen

Why Choose Qwen2-VL 72B

Strong mid-tier performance balancing capability and cost

32K token context window for substantial input processing

Supports text + image + video — true multimodal capability

Fully open source — self-host, fine-tune, and customize without restrictions

Strengths & Limitations

Strengths

+Top-tier benchmark scores across categories
+Excellent math performance
+Very affordable pricing
+Open source — can self-host and fine-tune

Limitations

No significant limitations identified

Benchmark Results

MMLU84.0

HumanEval72.0

GSM8K88.0

Quick Comparison

vs similar-tier models

Model	Input	Output	Context	Avg Score
Qwen2-VL 72BCurrent Alibaba/Qwen	$0.40	$0.40	32K	81.3
o3-mini OpenAI	$1.10	$4.40	200K	86.3
DeepSeek-R1 DeepSeek	$0.55	$2.19	128K	87.0

Full Comparison

Pricing Calculator

How pricing works A token is roughly ¾ of a word. A 1,000-word article is about 1,333 tokens. You pay separately for input (what you send) and output (what the model replies).

Describe a single image

<$0.001

Photo → detailed description

1,000 in · 200 out

Analyze a chart or diagram

$0.0010

Visual data → structured insights

2,000 in · 500 out

OCR a 10-page document

$0.0072

Scanned pages → structured text

15,000 in · 3,000 out

Batch process 100 images

$0.048

Bulk image analysis pipeline

100,000 in · 20,000 out

At scale: 1,000 requests/day

Image descriptions

$14/mo

$0.48/day

Document OCR

$216/mo

$7/day

Batch image analysis

$1440/mo

$48/day

Technical Specifications

ProviderAlibaba/Qwen

ArchitectureTransformer + Vision

Parameters72B

Context Window32K tokens

Max Output8K tokens

Modalitiestext, image, video

Open SourceYes

Release DateOctober 2, 2024

Community Ratings

No ratings yet. Be the first to rate this model!

Rate This Model

Sign in to rate this model and share your experience.

Comments

0 comments

Sign in to leave a comment and join the discussion.

No comments yet. Be the first to share your thoughts!

More from Alibaba/Qwen

Qwen3-Coder 480B

Alibaba/Qwen

frontier

Specialized code model trained on 7.5T tokens (70% code). Supports 100+ programming languages and agentic workflows.

textcode

Input

$0.30/M

Output

$0.60/M

Context

262K

Qwen3-VL 235B

Alibaba/Qwen

frontier

Most capable open VLM rivaling GPT-5 across multimodal benchmarks. Strong reasoning and agentic capabilities.

textimagevideo

Input

$0.30/M

Output

$0.60/M

Context

128K

Qwen3.5 397B

Alibaba/Qwen

frontier

Alibaba's open-weight hybrid MoE model with 512 experts and 17B active parameters. Natively multimodal with 201 language support. Top scores on GPQA and SWE-bench.

textimagevideocode

Input

$0.15/M

Output

$1.00/M

Context

256K

Similar Mid Models

o3-mini

OpenAI

mid

OpenAI's efficient reasoning model, optimized for speed while maintaining strong analytical capabilities.

text

Input

$1.10/M

Output

$4.40/M

Context

200K

DeepSeek-R1

DeepSeek

mid

DeepSeek's reasoning model with transparent chain-of-thought. Open-source and highly competitive.

text

Input

$0.55/M

Output

$2.19/M

Context

128K

Claude Sonnet 4

Anthropic

mid

Anthropic's best balance of intelligence and speed. Excellent for production workloads.

textimage

Input

$3.00/M

Output

$15.00/M

Context

200K

Qwen2-VL 72B

Why Choose Qwen2-VL 72B

Strengths & Limitations

Strengths

Limitations

Benchmark Results

Quick Comparison

Quick Comparison

Pricing Calculator

At scale: 1,000 requests/day

Technical Specifications

Community Ratings

Rate This Model

Comments

More from Alibaba/Qwen

Qwen3-Coder 480B

Qwen3-VL 235B

Qwen3.5 397B

Similar Mid Models

o3-mini

DeepSeek-R1

Claude Sonnet 4

Compare Qwen2-VL 72B with other models