Model Directory

OpenAI's most advanced multimodal model. Excels at text, vision, and audio tasks with fast response times.

Input

$2.50/M

Output

$10.00/M

Context

128K

o1

OpenAI

OpenAI's reasoning model with chain-of-thought capabilities for complex problem solving.

Input

$15.00/M

Output

$60.00/M

Context

200K

Claude Opus 4

Anthropic

Anthropic's most powerful model. Top-tier performance on coding, analysis, and complex reasoning tasks.

Input

$15.00/M

Output

$75.00/M

Context

200K

Claude Sonnet 4

Anthropic

Anthropic's best balance of intelligence and speed. Excellent for production workloads.

Input

$3.00/M

Output

$15.00/M

Context

200K

Gemini 2.0 Flash

Google

Google's fastest multimodal model with native tool use and advanced agentic capabilities.

Input

$0.10/M

Output

$0.40/M

Context

1.0M

Gemini 2.5 Pro

Google

Google's most capable thinking model with breakthrough performance on reasoning and coding.

Input

$1.25/M

Output

$10.00/M

Context

1.0M

GPT-4.1

OpenAI

OpenAI's latest GPT-4 series model with improved coding, instruction following, and long context.

Input

$2.00/M

Output

$8.00/M

Context

1.0M

Kimi K2.5

Moonshot AI

Moonshot AI's frontier multimodal MoE model with 1T total parameters (32B active). Tops SWE-bench and AIME 2025 benchmarks.

Input

$0.45/M

Output

$2.20/M

Context

256K

Qwen3.5 397B

Alibaba/Qwen

Alibaba's open-weight hybrid MoE model with 512 experts and 17B active parameters. Natively multimodal with 201 language support. Top scores on GPQA and SWE-bench.

textimagevideocode

Input

$0.15/M

Output

$1.00/M

Context

256K

o3

OpenAI

OpenAI's most powerful reasoning model with breakthrough performance on math and coding benchmarks.

Input

$10.00/M

Output

$40.00/M

Context

200K

Gemini 2.5 Flash

Google

Google's fast and cost-efficient thinking model with strong reasoning capabilities.

Input

$0.15/M

Output

$0.60/M

Context

1.0M

Llama 4 Scout

Meta

Meta's latest open-source MoE model with 17B active parameters and industry-leading 10M token context.

Input

$0.15/M

Output

$0.60/M

Context

10.5M

Llama 4 Maverick

Meta

Meta's powerful open-source MoE model with 400B total params and 1M context window.

Input

$0.50/M

Output

$2.00/M

Context

1.0M

Grok-3

xAI

xAI's frontier model trained on Colossus supercluster. Real-time data access and strong reasoning.

Input

$3.00/M

Output

$15.00/M

Context

131K

Seedance 2.0

ByteDance

ByteDance's unified multimodal generation model that handles video, audio, and image synthesis within a single architecture. Seedance 2.0 produces highly coherent audiovisual content with strong temporal consistency, supporting diverse creative workflows from music video generation to product advertisement creation with synchronized narration and effects.

videoaudioimage

Input

$3.00/M

Output

$70.00/M

GPT Image 1

OpenAI

OpenAI's native image generation capability integrated directly into GPT-4o, enabling conversational image creation and iterative editing through natural language. GPT Image 1 excels at accurate text rendering within images, complex multi-element compositions, and faithful adherence to detailed prompts across photorealistic, illustrative, and artistic styles.

Input

$10.00/M

Output

$40.00/M

Imagen 4

Google

Google DeepMind's fourth-generation image synthesis model capable of producing images up to 2K resolution with exceptional photorealism and compositional accuracy. Imagen 4 includes SynthID watermarking by default for responsible AI deployment, supports advanced inpainting and outpainting, and demonstrates industry-leading performance on text rendering and spatial reasoning tasks.

Input

$4.00/M

Output

$20.00/M

FLUX.2 Pro

Black Forest Labs

Black Forest Labs' flagship commercial image generation model with 32 billion parameters, delivering up to 4-megapixel resolution output with exceptional detail and prompt fidelity. FLUX.2 Pro achieves state-of-the-art results in photorealism, typography rendering, and complex scene composition, making it a top choice for professional creative applications.

Input

$3.00/M

Output

$30.00/M

Midjourney V7

Midjourney

Midjourney's seventh major model release featuring 12 billion parameters and expanded multimodal capabilities including short video clip generation alongside its renowned image synthesis. V7 delivers dramatically improved coherence, photorealism, and artistic range, with enhanced understanding of spatial relationships, lighting, and material properties across diverse visual styles.

imagevideo

Input

$5.00/M

Output

$50.00/M

GPT-5

OpenAI

OpenAI's flagship model replacing GPT-4o and o3. Achieves 94.6% on AIME 2025 and 74.9% on SWE-bench. Multimodal with thinking capabilities.

Input

$5.00/M

Output

$15.00/M

Context

197K

GPT-5.2

OpenAI

Expanded GPT-5 with 400K context and 128K max output. Near-perfect 100% on AIME 2025 math benchmark.

Input

$8.00/M

Output

$24.00/M

Context

400K

Claude Opus 4.5

Anthropic

Flagship Opus release with major improvements in coding and workplace productivity tasks. Predecessor to Opus 4.6.

Input

$15.00/M

Output

$75.00/M

Context

1.0M

Gemini 3 Pro

Google

Most powerful Gemini model with native multimodal understanding. Supports adjustable reasoning depth via thinking_level parameter.

Input

$3.50/M

Output

$10.50/M

Context

1.0M

Qwen3-VL 235B

Alibaba/Qwen

Most capable open VLM rivaling GPT-5 across multimodal benchmarks. Strong reasoning and agentic capabilities.

Input

$0.30/M

Output

$0.60/M

Context

128K

Claude Opus 4.6

Anthropic

Anthropic's strongest reasoning and coding model. 80.8% on SWE-bench Verified, 1M context (beta), agent teams, and extended thinking.

Input

$5.00/M

Output

$25.00/M

Context

1.0M

Claude Sonnet 4.6

Anthropic

Matches Opus 4.6 on most benchmarks at 1/5 the cost. 79.6% on SWE-bench, 1M context, computer use, and design capabilities.

Input

$3.00/M

Output

$15.00/M

Context

1.0M

Gemini 3 Flash

Google

Google's frontier-class model at Flash-level latency and cost. 90.4% on GPQA Diamond, 78% on SWE-bench, 1M context window.

Input

$0.50/M

Output

$3.00/M

Context

1.0M

Gemini 3.1 Pro

Google

Google's most capable model. 94.3% on GPQA Diamond, 80.6% on SWE-bench, 77.1% on ARC-AGI-2. #1 on 12 of 18 tracked benchmarks.

Input

$2.00/M

Output

$12.00/M

Context

1.0M

Amazon Nova 2 Pro

Amazon

Most intelligent Amazon model for complex multi-step reasoning and agentic workflows.

textimagevideoaudio

Input

$4.00/M

Output

$12.00/M

Context

1.0M

Qwen2-VL 72B

Alibaba/Qwen

Alibaba's open-source vision-language model with video understanding capabilities.

Input

$0.40/M

Output

$0.40/M

Context

32K

Falcon 2 11B

TII

TII's efficient open-source model with multimodal capabilities.

Input

$0.04/M

Output

$0.04/M

Context

InternVL2 26B

Shanghai AI Lab

Open-source vision-language model with strong image understanding capabilities.

Input

$0.08/M

Output

$0.08/M

Context

GLM-4

Zhipu AI

Zhipu AI's flagship model with strong Chinese and English bilingual capabilities.

Input

$1.00/M

Output

$3.00/M

Context

128K

o4-mini

OpenAI

OpenAI's cost-efficient reasoning model with multimodal input, strong math and coding performance at a fraction of o3 pricing.

Input

$1.10/M

Output

$4.40/M

Context

200K

o3-pro

OpenAI

OpenAI's highest-quality reasoning model with extended compute for complex scientific and mathematical problems.

Input

$20.00/M

Output

$80.00/M

Context

200K

Gemini 2.0 Flash-Lite

Google

Google's ultra-efficient model offering better performance than Gemini 1.5 Flash at the same cost point.

Input

$0.07/M

Output

$0.30/M

Context

1.0M

Mistral Large 3

Mistral AI

Mistral's open-weight 675B MoE model with 41B active parameters, multimodal input, and 256K context.

Input

$0.50/M

Output

$1.50/M

Context

256K

Mistral Small 3.1

Mistral AI

Compact 24B model with image understanding, 128K context, and Apache 2.0 license.

Input

$0.10/M

Output

$0.30/M

Context

128K

Phi-4-multimodal

Microsoft

Microsoft's 5.6B compact model unifying text, vision, and speech in a single architecture.

Input

$0.02/M

Output

$0.02/M

Context

128K

Grok 4.20

xAI

xAI's 4-agent parallel collaboration system with rapid learning architecture and medical document analysis. Beta release.

Input

$3.00/M

Output

$15.00/M

Context

131K

Molmo 72B

Allen AI

Open multimodal model for visual understanding, image captioning, and visual question answering.

Input

$0.40/M

Output

$1.20/M

Context

128K

GLM-4.5V

Zhipu AI

Vision-language MoE model with superior performance at lower inference cost.

Input

$0.15/M

Output

$0.30/M

Context

128K

BGE-VL

BAAI

State-of-the-art multimodal embedding model for visual search applications.

Input

$0.02/M

Output

$0.02/M

Context

MiniCPM-V 2.6

OpenBMB

Efficient vision-language model rivaling GPT-4V quality at a fraction of the size.

Input

$0.10/M

Output

$0.20/M

Context

128K

HyperCLOVA X

Naver

Korean sovereign AI with omnimodal capabilities. Specialized for Korean language and culture.

Input

$1.00/M

Output

$3.00/M

Context

128K

GPT Image 1.5

OpenAI

An optimized successor to GPT Image 1 that delivers 20% lower cost and 4x faster generation while maintaining equivalent visual quality. GPT Image 1.5 introduces improved batch processing, enhanced style consistency for multi-image projects, and refined detail handling for professional design and marketing workflows.

Input

$8.00/M

Output

$32.00/M

GPT Image 1 Mini

OpenAI

A cost-efficient variant of OpenAI's image generation model offering 54-70% lower pricing while retaining strong prompt adherence and visual quality for standard use cases. GPT Image 1 Mini is optimized for high-volume applications such as e-commerce product imagery, social media content, and rapid prototyping where speed and cost matter more than maximum fidelity.

Input

$2.50/M

Output

$8.00/M

Phi-3.5 Vision

Microsoft

Lightweight multimodal model with vision capabilities for on-device and edge visual understanding.

Input

$0.05/M

Output

$0.10/M

Context

128K

Gemini 2.5 Flash Image

Google

A multimodal extension of Google's Gemini 2.5 Flash model that adds native image generation and editing capabilities alongside text understanding. This model enables conversational image creation, iterative visual refinement, and combined text-image output within a single unified interface, making it particularly effective for design iteration and creative brainstorming workflows.

imagetext

Input

$0.15/M

Output

$30.00/M

Gemini 2 Flash Thinking

Google

Experimental Gemini model with extended chain-of-thought reasoning. Transparent thinking process with strong performance on math and science.

Input

$0.15/M

Output

$0.60/M

Context

1.0M

FLUX.2 Dev

Black Forest Labs

The open-weights development version of FLUX.2 with the same 32 billion parameter architecture as the Pro variant, released for non-commercial research and experimentation. FLUX.2 Dev provides researchers full access to model weights for fine-tuning, distillation, and architectural exploration while delivering near-Pro-level quality for academic and personal projects.

Input

Free/M

Output

Free/M

Jina Embeddings V4

Jina AI

Universal multimodal embedding handling text, images, and documents in 30+ languages.

Input

$0.02/M

Output

$0.02/M

Context

Adobe Firefly Image 5

Adobe

Adobe's latest commercially safe image generation model trained exclusively on licensed and public domain content, delivering photorealistic output at native 4-megapixel resolution. Firefly Image 5 integrates deeply with Adobe Creative Cloud, offering advanced composition controls, style references, and seamless editing workflows within Photoshop and Illustrator.

Input

$4.00/M

Output

$35.00/M

Adobe Firefly Image 4

Adobe

Adobe's fourth-generation Firefly image model offering improved quality, faster generation, and enhanced creative controls compared to its predecessors. Firefly Image 4 provides robust structure references, style transfer, and generative fill capabilities, all trained on Adobe's commercially licensed dataset to ensure IP safety for enterprise and professional use.

Input

$3.00/M

Output

$25.00/M

Stable Diffusion 3.5 Large

Stability AI

Stability AI's largest open-source image generation model built on the Multimodal Diffusion Transformer (MMDiT) architecture. SD 3.5 Large delivers high-quality results across photorealistic and artistic styles with strong prompt adherence, accurate text rendering, and diverse composition capabilities, available under an open license for both research and commercial use.

Input

$0.50/M

Output

$6.50/M

Ideogram 3.0

Ideogram

Ideogram's third-generation model combining exceptional photorealism with industry-leading text rendering accuracy within generated images. Ideogram 3.0 handles complex typography, logos, signs, and handwritten text with remarkable fidelity, making it the preferred choice for design professionals working on brand assets, marketing materials, and content requiring reliable in-image text.

Input

$2.00/M

Output

$20.00/M

Recraft V3

Recraft

Recraft's flagship image generation model that achieved the number one ranking on the HuggingFace text-to-image leaderboard, with native support for both raster and vector output formats. Recraft V3 excels at brand-consistent design, offering precise color palette control, style locking, and batch generation capabilities that make it uniquely suited for professional design systems.

Input

$2.00/M

Output

$20.00/M

MAI-Image-1

Microsoft

Microsoft's first in-house image generation model developed by Microsoft AI, designed for integration across Microsoft's product ecosystem including Designer, Copilot, and Bing Image Creator. MAI-Image-1 focuses on safety, controllability, and consistent quality, with built-in content filtering and provenance metadata for responsible enterprise deployment.

Input

$2.00/M

Output

$15.00/M

Hunyuan Image 3.0

Tencent

World's largest open-source text-to-image model using MoE architecture with 64 experts.

Input

$0.03/M

Output

$0.03/M

InternVL3 78B

Shanghai AI Lab

State-of-the-art open multimodal LLM scoring 72.2 on MMMU. New record among open MLLMs.

Input

$0.40/M

Output

$1.20/M

Context

128K

InternVL2.5 78B

Shanghai AI Lab

Advanced vision-language model with improved document and chart understanding capabilities.

Input

$0.40/M

Output

$1.20/M

Context

128K

Yi-VL 34B

01.AI

Vision-language Yi model for image understanding and visual question answering.

Input

$0.30/M

Output

$0.60/M

Context

16K

SD 3.5 Medium

Stability AI

Mid-size Stable Diffusion optimized for consumer GPUs and edge devices.

Input

$0.02/M

Output

$0.02/M

FLUX.2 Klein 4B

Black Forest Labs

Fastest FLUX model generating and editing images in under one second. Fully open under Apache 2.0.

Input

$0.01/M

Output

$0.01/M

FLUX.1 Schnell

Black Forest Labs

Fast open-source text-to-image model with 4-step generation. Apache 2.0 licensed.

Input

$0.02/M

Output

$0.02/M

FLUX.1 Pro

Black Forest Labs

Premium text-to-image model with highest technical quality and 4.5-second generation.

Input

$0.05/M

Output

$0.05/M

Aya Vision 32B

Cohere

Cohere's open multimodal model for visual understanding across 23 languages. Strong image captioning and visual QA.

Input

$0.25/M

Output

$0.50/M

Context

128K

GPT-4o Mini

OpenAI

A smaller, faster, and more affordable version of GPT-4o. Great for lightweight tasks.

Input

$0.15/M

Output

$0.60/M

Context

128K

Amazon Nova 2 Lite

Amazon

Fast, cost-effective reasoning model with built-in code interpreter and web grounding.

Input

$0.80/M

Output

$2.40/M

Context

1.0M

DALL-E 3

OpenAI

OpenAI's image generation model excelling at precision, complex prompts, and readable text rendering within images.

Input

$0.04/M

Output

$0.04/M

Amazon Nova Canvas

Amazon

Image generation model with fine-grained control over composition, style, and content.

Input

$0.04/M

Output

$0.04/M

Claude Haiku 3.5

Anthropic

Anthropic's fastest and most affordable model. Great for high-volume, low-latency tasks.

Input

$0.80/M

Output

$4.00/M

Context

200K

Claude Sonnet 4.5

Anthropic

High-intelligence Sonnet model with 1M token context window. Strong balance of performance and cost.

Input

$3.00/M

Output

$15.00/M

Context

1.0M

Claude Haiku 4.5

Anthropic

Fastest and most cost-efficient Claude model designed for high-throughput, low-latency applications.

Input

$0.80/M

Output

$4.00/M

Context

200K

Gemini 1.5 Pro

Google

Google's previous-gen flagship model with the longest context window in production.

Input

$1.25/M

Output

$5.00/M

Context

2.1M

Grok-2

xAI

xAI's large language model with real-time X (Twitter) data access and strong reasoning.

Input

$2.00/M

Output

$10.00/M

Context

131K

Reka Core

Reka AI

Full multimodal model handling text, image, video, and audio inputs natively.

textimagevideoaudio

Input

$3.00/M

Output

$9.00/M

Context

128K

GPT-4.1 Mini

OpenAI

A fast, affordable variant of GPT-4.1 for high-volume workloads.

Input

$0.40/M

Output

$1.60/M

Context

1.0M

Gemini 3 Deep Think

Google

Specialized reasoning model designed for science, research, and complex engineering challenges.

Input

$5.00/M

Output

$15.00/M

Context

1.0M

PaliGemma2 28B

Google

Open vision-language model for image captioning, visual QA, and OCR tasks. Built on Gemma 2 backbone.

Input

$0.30/M

Output

$0.60/M

Context

GPT-4.1 Nano

OpenAI

OpenAI's fastest and cheapest model. Ideal for classification, autocomplete, and high-throughput tasks.

Input

$0.10/M

Output

$0.40/M

Context

1.0M

PaliGemma2 10B

Google

Mid-size PaliGemma for efficient vision-language tasks. Strong OCR and document understanding.

Input

$0.15/M

Output

$0.30/M

Context

GPT-4.5 Preview

OpenAI

OpenAI's research preview with improved emotional intelligence and reduced hallucinations.

Input

$75.00/M

Output

$150.00/M

Context

128K

Claude 3.5 Sonnet v2

Anthropic

Upgraded Claude 3.5 Sonnet with major coding and tool-use improvements, plus computer use capability.

Input

$3.00/M

Output

$15.00/M

Context

200K

Reka Flash

Reka AI

One of the few 21B models supporting full interleaved multimodal inputs. Videos up to 5 minutes.

textimagevideoaudio

Input

$0.80/M

Output

$2.40/M

Context

128K

Gemma 3 27B

Google

Google's open-source multimodal model. Strong performance for its size with vision capabilities.

Input

$0.10/M

Output

$0.10/M

Context

128K

Gemma 3 12B

Google

Efficient open-source model from Google with multimodal capabilities at 12B parameters.

Input

$0.05/M

Output

$0.05/M

Context

128K

Gemma 3 4B

Google

Ultra-efficient open-source model from Google. Runs on mobile and edge devices.

Input

$0.02/M

Output

$0.02/M

Context

128K

Qwen2.5-VL 7B

Alibaba/Qwen

Compact vision-language model excelling at video and image analysis. Top small multimodal model on Hugging Face.

Input

$0.10/M

Output

$0.30/M

Context

128K

Qwen2.5-VL 3B

Alibaba/Qwen

Smallest Qwen VL model for lightweight vision-language tasks on constrained hardware.

Input

$0.04/M

Output

$0.12/M

Context

128K

Llama 3.2 90B Vision

Meta

Meta's largest multimodal Llama model with image understanding capabilities.

Input

$0.35/M

Output

$0.40/M

Context

128K

Llama 3.2 11B Vision

Meta

Efficient multimodal Llama model for image + text tasks at 11B parameters.

Input

$0.06/M

Output

$0.06/M

Context

128K

Pixtral 12B

Mistral AI

Mistral's open-source multimodal model. Processes images natively alongside text.

Input

$0.10/M

Output

$0.10/M

Context

128K

Pixtral Large

Mistral AI

Mistral's flagship multimodal model. Built on Mistral Large with vision capabilities.

Input

$2.00/M

Output

$6.00/M

Context

128K

DeepSeek-VL2

DeepSeek

Vision-language model for image understanding, OCR, and visual reasoning tasks.