Explore 95+ AI models from 49 providers. Filter by capability, tier, and pricing to find the right model.
95 results
OpenAI
OpenAI's most advanced multimodal model. Excels at text, vision, and audio tasks with fast response times.
Input
$2.50/M
Output
$10.00/M
Context
128K
OpenAI
OpenAI's reasoning model with chain-of-thought capabilities for complex problem solving.
Input
$15.00/M
Output
$60.00/M
Context
200K
Anthropic
Anthropic's most powerful model. Top-tier performance on coding, analysis, and complex reasoning tasks.
Input
$15.00/M
Output
$75.00/M
Context
200K
Anthropic
Anthropic's best balance of intelligence and speed. Excellent for production workloads.
Input
$3.00/M
Output
$15.00/M
Context
200K
Google's fastest multimodal model with native tool use and advanced agentic capabilities.
Input
$0.10/M
Output
$0.40/M
Context
1.0M
Google's most capable thinking model with breakthrough performance on reasoning and coding.
Input
$1.25/M
Output
$10.00/M
Context
1.0M
OpenAI
OpenAI's latest GPT-4 series model with improved coding, instruction following, and long context.
Input
$2.00/M
Output
$8.00/M
Context
1.0M
Moonshot AI
Moonshot AI's frontier multimodal MoE model with 1T total parameters (32B active). Tops SWE-bench and AIME 2025 benchmarks.
Input
$0.45/M
Output
$2.20/M
Context
256K
Alibaba/Qwen
Alibaba's open-weight hybrid MoE model with 512 experts and 17B active parameters. Natively multimodal with 201 language support. Top scores on GPQA and SWE-bench.
Input
$0.15/M
Output
$1.00/M
Context
256K
OpenAI
OpenAI's most powerful reasoning model with breakthrough performance on math and coding benchmarks.
Input
$10.00/M
Output
$40.00/M
Context
200K
Google's fast and cost-efficient thinking model with strong reasoning capabilities.
Input
$0.15/M
Output
$0.60/M
Context
1.0M
Meta
Meta's latest open-source MoE model with 17B active parameters and industry-leading 10M token context.
Input
$0.15/M
Output
$0.60/M
Context
10.5M
Meta
Meta's powerful open-source MoE model with 400B total params and 1M context window.
Input
$0.50/M
Output
$2.00/M
Context
1.0M
xAI
xAI's frontier model trained on Colossus supercluster. Real-time data access and strong reasoning.
Input
$3.00/M
Output
$15.00/M
Context
131K
ByteDance
ByteDance's unified multimodal generation model that handles video, audio, and image synthesis within a single architecture. Seedance 2.0 produces highly coherent audiovisual content with strong temporal consistency, supporting diverse creative workflows from music video generation to product advertisement creation with synchronized narration and effects.
Input
$3.00/M
Output
$70.00/M
OpenAI
OpenAI's native image generation capability integrated directly into GPT-4o, enabling conversational image creation and iterative editing through natural language. GPT Image 1 excels at accurate text rendering within images, complex multi-element compositions, and faithful adherence to detailed prompts across photorealistic, illustrative, and artistic styles.
Input
$10.00/M
Output
$40.00/M
Google DeepMind's fourth-generation image synthesis model capable of producing images up to 2K resolution with exceptional photorealism and compositional accuracy. Imagen 4 includes SynthID watermarking by default for responsible AI deployment, supports advanced inpainting and outpainting, and demonstrates industry-leading performance on text rendering and spatial reasoning tasks.
Input
$4.00/M
Output
$20.00/M
Black Forest Labs
Black Forest Labs' flagship commercial image generation model with 32 billion parameters, delivering up to 4-megapixel resolution output with exceptional detail and prompt fidelity. FLUX.2 Pro achieves state-of-the-art results in photorealism, typography rendering, and complex scene composition, making it a top choice for professional creative applications.
Input
$3.00/M
Output
$30.00/M
Midjourney
Midjourney's seventh major model release featuring 12 billion parameters and expanded multimodal capabilities including short video clip generation alongside its renowned image synthesis. V7 delivers dramatically improved coherence, photorealism, and artistic range, with enhanced understanding of spatial relationships, lighting, and material properties across diverse visual styles.
Input
$5.00/M
Output
$50.00/M
OpenAI
OpenAI's flagship model replacing GPT-4o and o3. Achieves 94.6% on AIME 2025 and 74.9% on SWE-bench. Multimodal with thinking capabilities.
Input
$5.00/M
Output
$15.00/M
Context
197K
OpenAI
Expanded GPT-5 with 400K context and 128K max output. Near-perfect 100% on AIME 2025 math benchmark.
Input
$8.00/M
Output
$24.00/M
Context
400K
Anthropic
Flagship Opus release with major improvements in coding and workplace productivity tasks. Predecessor to Opus 4.6.
Input
$15.00/M
Output
$75.00/M
Context
1.0M
Most powerful Gemini model with native multimodal understanding. Supports adjustable reasoning depth via thinking_level parameter.
Input
$3.50/M
Output
$10.50/M
Context
1.0M
Alibaba/Qwen
Most capable open VLM rivaling GPT-5 across multimodal benchmarks. Strong reasoning and agentic capabilities.
Input
$0.30/M
Output
$0.60/M
Context
128K
Anthropic
Anthropic's strongest reasoning and coding model. 80.8% on SWE-bench Verified, 1M context (beta), agent teams, and extended thinking.
Input
$5.00/M
Output
$25.00/M
Context
1.0M
Anthropic
Matches Opus 4.6 on most benchmarks at 1/5 the cost. 79.6% on SWE-bench, 1M context, computer use, and design capabilities.
Input
$3.00/M
Output
$15.00/M
Context
1.0M
Google's frontier-class model at Flash-level latency and cost. 90.4% on GPQA Diamond, 78% on SWE-bench, 1M context window.
Input
$0.50/M
Output
$3.00/M
Context
1.0M
Google's most capable model. 94.3% on GPQA Diamond, 80.6% on SWE-bench, 77.1% on ARC-AGI-2. #1 on 12 of 18 tracked benchmarks.
Input
$2.00/M
Output
$12.00/M
Context
1.0M
Amazon
Most intelligent Amazon model for complex multi-step reasoning and agentic workflows.
Input
$4.00/M
Output
$12.00/M
Context
1.0M
Alibaba/Qwen
Alibaba's open-source vision-language model with video understanding capabilities.
Input
$0.40/M
Output
$0.40/M
Context
32K
TII
TII's efficient open-source model with multimodal capabilities.
Input
$0.04/M
Output
$0.04/M
Context
8K
Shanghai AI Lab
Open-source vision-language model with strong image understanding capabilities.
Input
$0.08/M
Output
$0.08/M
Context
8K
Zhipu AI
Zhipu AI's flagship model with strong Chinese and English bilingual capabilities.
Input
$1.00/M
Output
$3.00/M
Context
128K
OpenAI
OpenAI's cost-efficient reasoning model with multimodal input, strong math and coding performance at a fraction of o3 pricing.
Input
$1.10/M
Output
$4.40/M
Context
200K
OpenAI
OpenAI's highest-quality reasoning model with extended compute for complex scientific and mathematical problems.
Input
$20.00/M
Output
$80.00/M
Context
200K
Google's ultra-efficient model offering better performance than Gemini 1.5 Flash at the same cost point.
Input
$0.07/M
Output
$0.30/M
Context
1.0M
Mistral AI
Mistral's open-weight 675B MoE model with 41B active parameters, multimodal input, and 256K context.
Input
$0.50/M
Output
$1.50/M
Context
256K
Mistral AI
Compact 24B model with image understanding, 128K context, and Apache 2.0 license.
Input
$0.10/M
Output
$0.30/M
Context
128K
Microsoft
Microsoft's 5.6B compact model unifying text, vision, and speech in a single architecture.
Input
$0.02/M
Output
$0.02/M
Context
128K
xAI
xAI's 4-agent parallel collaboration system with rapid learning architecture and medical document analysis. Beta release.
Input
$3.00/M
Output
$15.00/M
Context
131K
Allen AI
Open multimodal model for visual understanding, image captioning, and visual question answering.
Input
$0.40/M
Output
$1.20/M
Context
128K
Zhipu AI
Vision-language MoE model with superior performance at lower inference cost.
Input
$0.15/M
Output
$0.30/M
Context
128K
BAAI
State-of-the-art multimodal embedding model for visual search applications.
Input
$0.02/M
Output
$0.02/M
Context
8K
OpenBMB
Efficient vision-language model rivaling GPT-4V quality at a fraction of the size.
Input
$0.10/M
Output
$0.20/M
Context
128K
Naver
Korean sovereign AI with omnimodal capabilities. Specialized for Korean language and culture.
Input
$1.00/M
Output
$3.00/M
Context
128K
OpenAI
An optimized successor to GPT Image 1 that delivers 20% lower cost and 4x faster generation while maintaining equivalent visual quality. GPT Image 1.5 introduces improved batch processing, enhanced style consistency for multi-image projects, and refined detail handling for professional design and marketing workflows.
Input
$8.00/M
Output
$32.00/M
OpenAI
A cost-efficient variant of OpenAI's image generation model offering 54-70% lower pricing while retaining strong prompt adherence and visual quality for standard use cases. GPT Image 1 Mini is optimized for high-volume applications such as e-commerce product imagery, social media content, and rapid prototyping where speed and cost matter more than maximum fidelity.
Input
$2.50/M
Output
$8.00/M
Microsoft
Lightweight multimodal model with vision capabilities for on-device and edge visual understanding.
Input
$0.05/M
Output
$0.10/M
Context
128K
A multimodal extension of Google's Gemini 2.5 Flash model that adds native image generation and editing capabilities alongside text understanding. This model enables conversational image creation, iterative visual refinement, and combined text-image output within a single unified interface, making it particularly effective for design iteration and creative brainstorming workflows.
Input
$0.15/M
Output
$30.00/M
Experimental Gemini model with extended chain-of-thought reasoning. Transparent thinking process with strong performance on math and science.
Input
$0.15/M
Output
$0.60/M
Context
1.0M
Black Forest Labs
The open-weights development version of FLUX.2 with the same 32 billion parameter architecture as the Pro variant, released for non-commercial research and experimentation. FLUX.2 Dev provides researchers full access to model weights for fine-tuning, distillation, and architectural exploration while delivering near-Pro-level quality for academic and personal projects.
Input
Free/M
Output
Free/M
Jina AI
Universal multimodal embedding handling text, images, and documents in 30+ languages.
Input
$0.02/M
Output
$0.02/M
Context
8K
Adobe
Adobe's latest commercially safe image generation model trained exclusively on licensed and public domain content, delivering photorealistic output at native 4-megapixel resolution. Firefly Image 5 integrates deeply with Adobe Creative Cloud, offering advanced composition controls, style references, and seamless editing workflows within Photoshop and Illustrator.
Input
$4.00/M
Output
$35.00/M
Adobe
Adobe's fourth-generation Firefly image model offering improved quality, faster generation, and enhanced creative controls compared to its predecessors. Firefly Image 4 provides robust structure references, style transfer, and generative fill capabilities, all trained on Adobe's commercially licensed dataset to ensure IP safety for enterprise and professional use.
Input
$3.00/M
Output
$25.00/M
Stability AI
Stability AI's largest open-source image generation model built on the Multimodal Diffusion Transformer (MMDiT) architecture. SD 3.5 Large delivers high-quality results across photorealistic and artistic styles with strong prompt adherence, accurate text rendering, and diverse composition capabilities, available under an open license for both research and commercial use.
Input
$0.50/M
Output
$6.50/M
Ideogram
Ideogram's third-generation model combining exceptional photorealism with industry-leading text rendering accuracy within generated images. Ideogram 3.0 handles complex typography, logos, signs, and handwritten text with remarkable fidelity, making it the preferred choice for design professionals working on brand assets, marketing materials, and content requiring reliable in-image text.
Input
$2.00/M
Output
$20.00/M
Recraft
Recraft's flagship image generation model that achieved the number one ranking on the HuggingFace text-to-image leaderboard, with native support for both raster and vector output formats. Recraft V3 excels at brand-consistent design, offering precise color palette control, style locking, and batch generation capabilities that make it uniquely suited for professional design systems.
Input
$2.00/M
Output
$20.00/M
Microsoft
Microsoft's first in-house image generation model developed by Microsoft AI, designed for integration across Microsoft's product ecosystem including Designer, Copilot, and Bing Image Creator. MAI-Image-1 focuses on safety, controllability, and consistent quality, with built-in content filtering and provenance metadata for responsible enterprise deployment.
Input
$2.00/M
Output
$15.00/M
Tencent
World's largest open-source text-to-image model using MoE architecture with 64 experts.
Input
$0.03/M
Output
$0.03/M
Shanghai AI Lab
State-of-the-art open multimodal LLM scoring 72.2 on MMMU. New record among open MLLMs.
Input
$0.40/M
Output
$1.20/M
Context
128K
Shanghai AI Lab
Advanced vision-language model with improved document and chart understanding capabilities.
Input
$0.40/M
Output
$1.20/M
Context
128K
01.AI
Vision-language Yi model for image understanding and visual question answering.
Input
$0.30/M
Output
$0.60/M
Context
16K
Stability AI
Mid-size Stable Diffusion optimized for consumer GPUs and edge devices.
Input
$0.02/M
Output
$0.02/M
Black Forest Labs
Fastest FLUX model generating and editing images in under one second. Fully open under Apache 2.0.
Input
$0.01/M
Output
$0.01/M
Black Forest Labs
Fast open-source text-to-image model with 4-step generation. Apache 2.0 licensed.
Input
$0.02/M
Output
$0.02/M
Black Forest Labs
Premium text-to-image model with highest technical quality and 4.5-second generation.
Input
$0.05/M
Output
$0.05/M
Cohere
Cohere's open multimodal model for visual understanding across 23 languages. Strong image captioning and visual QA.
Input
$0.25/M
Output
$0.50/M
Context
128K
OpenAI
A smaller, faster, and more affordable version of GPT-4o. Great for lightweight tasks.
Input
$0.15/M
Output
$0.60/M
Context
128K
Amazon
Fast, cost-effective reasoning model with built-in code interpreter and web grounding.
Input
$0.80/M
Output
$2.40/M
Context
1.0M
OpenAI
OpenAI's image generation model excelling at precision, complex prompts, and readable text rendering within images.
Input
$0.04/M
Output
$0.04/M
Amazon
Image generation model with fine-grained control over composition, style, and content.
Input
$0.04/M
Output
$0.04/M
Anthropic
Anthropic's fastest and most affordable model. Great for high-volume, low-latency tasks.
Input
$0.80/M
Output
$4.00/M
Context
200K
Anthropic
High-intelligence Sonnet model with 1M token context window. Strong balance of performance and cost.
Input
$3.00/M
Output
$15.00/M
Context
1.0M
Anthropic
Fastest and most cost-efficient Claude model designed for high-throughput, low-latency applications.
Input
$0.80/M
Output
$4.00/M
Context
200K
Google's previous-gen flagship model with the longest context window in production.
Input
$1.25/M
Output
$5.00/M
Context
2.1M
xAI
xAI's large language model with real-time X (Twitter) data access and strong reasoning.
Input
$2.00/M
Output
$10.00/M
Context
131K
Reka AI
Full multimodal model handling text, image, video, and audio inputs natively.
Input
$3.00/M
Output
$9.00/M
Context
128K
OpenAI
A fast, affordable variant of GPT-4.1 for high-volume workloads.
Input
$0.40/M
Output
$1.60/M
Context
1.0M
Specialized reasoning model designed for science, research, and complex engineering challenges.
Input
$5.00/M
Output
$15.00/M
Context
1.0M
Open vision-language model for image captioning, visual QA, and OCR tasks. Built on Gemma 2 backbone.
Input
$0.30/M
Output
$0.60/M
Context
8K
OpenAI
OpenAI's fastest and cheapest model. Ideal for classification, autocomplete, and high-throughput tasks.
Input
$0.10/M
Output
$0.40/M
Context
1.0M
Mid-size PaliGemma for efficient vision-language tasks. Strong OCR and document understanding.
Input
$0.15/M
Output
$0.30/M
Context
8K
OpenAI
OpenAI's research preview with improved emotional intelligence and reduced hallucinations.
Input
$75.00/M
Output
$150.00/M
Context
128K
Anthropic
Upgraded Claude 3.5 Sonnet with major coding and tool-use improvements, plus computer use capability.
Input
$3.00/M
Output
$15.00/M
Context
200K
Reka AI
One of the few 21B models supporting full interleaved multimodal inputs. Videos up to 5 minutes.
Input
$0.80/M
Output
$2.40/M
Context
128K
Google's open-source multimodal model. Strong performance for its size with vision capabilities.
Input
$0.10/M
Output
$0.10/M
Context
128K
Efficient open-source model from Google with multimodal capabilities at 12B parameters.
Input
$0.05/M
Output
$0.05/M
Context
128K
Ultra-efficient open-source model from Google. Runs on mobile and edge devices.
Input
$0.02/M
Output
$0.02/M
Context
128K
Alibaba/Qwen
Compact vision-language model excelling at video and image analysis. Top small multimodal model on Hugging Face.
Input
$0.10/M
Output
$0.30/M
Context
128K
Alibaba/Qwen
Smallest Qwen VL model for lightweight vision-language tasks on constrained hardware.
Input
$0.04/M
Output
$0.12/M
Context
128K
Meta
Meta's largest multimodal Llama model with image understanding capabilities.
Input
$0.35/M
Output
$0.40/M
Context
128K
Meta
Efficient multimodal Llama model for image + text tasks at 11B parameters.
Input
$0.06/M
Output
$0.06/M
Context
128K
Mistral AI
Mistral's open-source multimodal model. Processes images natively alongside text.
Input
$0.10/M
Output
$0.10/M
Context
128K
Mistral AI
Mistral's flagship multimodal model. Built on Mistral Large with vision capabilities.
Input
$2.00/M
Output
$6.00/M
Context
128K
DeepSeek
Vision-language model for image understanding, OCR, and visual reasoning tasks.
Input
$0.14/M
Output
$0.28/M
Context
128K