Model Directory

A smaller, faster, and more affordable version of GPT-4o. Great for lightweight tasks.

Input

$0.15/M

Output

$0.60/M

Context

128K

Claude Haiku 3.5

Anthropic

Anthropic's fastest and most affordable model. Great for high-volume, low-latency tasks.

Input

$0.80/M

Output

$4.00/M

Context

200K

Mistral Small

Mistral AI

Mistral's efficient model for everyday tasks. Fast and cost-effective.

Input

$0.10/M

Output

$0.30/M

Context

32K

GPT-4.1 Mini

OpenAI

A fast, affordable variant of GPT-4.1 for high-volume workloads.

Input

$0.40/M

Output

$1.60/M

Context

1.0M

GPT-4.1 Nano

OpenAI

OpenAI's fastest and cheapest model. Ideal for classification, autocomplete, and high-throughput tasks.

Input

$0.10/M

Output

$0.40/M

Context

1.0M

Gemma 3 12B

Google

Efficient open-source model from Google with multimodal capabilities at 12B parameters.

Input

$0.05/M

Output

$0.05/M

Context

128K

Gemma 3 4B

Google

Ultra-efficient open-source model from Google. Runs on mobile and edge devices.

Input

$0.02/M

Output

$0.02/M

Context

128K

Gemma 2 9B

Google

Efficient open-source model from Google. Great performance-to-size ratio.

Input

$0.03/M

Output

$0.03/M

Context

CodeGemma 7B

Google

Google's open-source code-focused model based on the Gemma architecture.

code

Input

$0.03/M

Output

$0.03/M

Context

Llama 3.2 11B Vision

Meta

Efficient multimodal Llama model for image + text tasks at 11B parameters.

Input

$0.06/M

Output

$0.06/M

Context

128K

Llama 3.2 3B

Meta

Ultra-lightweight Llama model for edge deployment and mobile applications.

Input

$0.01/M

Output

$0.01/M

Context

128K

Llama 3.2 1B

Meta

The smallest Llama model for on-device inference and constrained environments.

Input

$0.01/M

Output

$0.01/M

Context

128K

Llama 3.1 8B

Meta

Meta's efficient open-source base model. Excellent for fine-tuning and custom deployments.

Input

$0.05/M

Output

$0.05/M

Context

128K

Mistral NeMo

Mistral AI

Mistral's 12B open-source model co-developed with NVIDIA. Replaces Mistral 7B.

Input

$0.04/M

Output

$0.04/M

Context

128K

Pixtral 12B

Mistral AI

Mistral's open-source multimodal model. Processes images natively alongside text.

Input

$0.10/M

Output

$0.10/M

Context

128K

Mixtral 8x7B

Mistral AI

The original open-source MoE model that started the MoE trend. Fast and efficient.

Input

$0.24/M

Output

$0.24/M

Context

32K

Ministral 8B

Mistral AI

Mistral's edge-optimized model with a knowledge-dense 8B parameter design.

Input

$0.10/M

Output

$0.10/M

Context

128K

Mistral 7B

Mistral AI

The model that launched Mistral. Open-source, fast, and surprisingly capable for 7B.

Input

$0.06/M

Output

$0.06/M

Context

32K

Qwen2.5 14B

Alibaba/Qwen

Efficient open-source model balancing capability and speed.

Input

$0.05/M

Output

$0.05/M

Context

128K

Qwen2.5 7B

Alibaba/Qwen

Compact open-source model for edge deployment and fine-tuning.

Input

$0.03/M

Output

$0.03/M

Context

32K

Qwen2.5-Coder 7B

Alibaba/Qwen

Compact open-source coding model with impressive code generation capabilities.

code

Input

$0.03/M

Output

$0.03/M

Context

128K

Phi-4

Microsoft

Microsoft's 14B open-source model with training innovations that punch above its weight class.

Input

$0.04/M

Output

$0.04/M

Context

16K

Phi-3.5 Mini

Microsoft

Microsoft's compact open-source model with 128K context. Great for on-device inference.

Input

$0.01/M

Output

$0.01/M

Context

128K

Phi-3 Medium

Microsoft

Microsoft's 14B open-source model with 128K context and strong reasoning capabilities.

Input

$0.04/M

Output

$0.04/M

Context

128K

Jamba 1.5 Mini

AI21 Labs

Compact version of Jamba with hybrid SSM-Transformer architecture.

Input

$0.20/M

Output

$0.40/M

Context

256K

Falcon 2 11B

TII

TII's efficient open-source model with multimodal capabilities.

Input

$0.04/M

Output

$0.04/M

Context

StableLM 2 12B

Stability AI

Stability AI's open-source language model with multilingual support.

Input

$0.04/M

Output

$0.04/M

Context

OLMo 2 13B

Allen AI

Fully open-source model from Allen AI with open training data, code, and weights.

Input

$0.04/M

Output

$0.04/M

Context

StarCoder2 15B

BigCode

Open-source code model from BigCode/HuggingFace trained on The Stack v2.

code

Input

$0.04/M

Output

$0.04/M

Context

16K

Gemini 2.0 Flash-Lite

Google

Google's ultra-efficient model offering better performance than Gemini 1.5 Flash at the same cost point.

Input

$0.07/M

Output

$0.30/M

Context

1.0M

Mistral Small 3.1

Mistral AI

Compact 24B model with image understanding, 128K context, and Apache 2.0 license.

Input

$0.10/M

Output

$0.30/M

Context

128K

Phi-4-mini

Microsoft

Microsoft's 3.8B parameter model with 128K context, strong reasoning capability for on-device deployment.

Input

$0.01/M

Output

$0.01/M

Context

128K

Phi-4-multimodal

Microsoft

Microsoft's 5.6B compact model unifying text, vision, and speech in a single architecture.

textimageaudio

Input

$0.02/M

Output

$0.02/M

Context

128K

Qwen3 8B

Alibaba/Qwen

Compact 8B model from the Qwen3 family with thinking mode support and strong efficiency for on-device use.

Input

$0.03/M

Output

$0.06/M

Context

131K

Qwen3 30B-A3B

Alibaba/Qwen

Ultra-efficient MoE model with 128 experts and only 3.3B active parameters, ideal for cost-sensitive deployments.

Input

$0.02/M

Output

$0.04/M

Context

131K

Tiny Aya

Cohere

Cohere's compact multilingual model supporting 70+ languages. Runs on consumer devices including phones. Outperforms Gemma3-4B in 46/61 languages.

Input

$0.01/M

Output

$0.01/M

Context

32K

Falcon 3 3B

TII

Compact Falcon model for resource-constrained deployments with strong reasoning.

Input

$0.04/M

Output

$0.08/M

Context

32K

Falcon 3 1B

TII

Smallest Falcon model for edge inference and mobile deployment.

Input

$0.02/M

Output

$0.04/M

Context

32K

OLMo 2 7B

Allen AI

Outperforms Llama 3.1 8B. Everything released: training data, weights, code, recipes, and checkpoints.

Input

$0.07/M

Output

$0.14/M

Context

128K

GPT Image 1 Mini

OpenAI

A cost-efficient variant of OpenAI's image generation model offering 54-70% lower pricing while retaining strong prompt adherence and visual quality for standard use cases. GPT Image 1 Mini is optimized for high-volume applications such as e-commerce product imagery, social media content, and rapid prototyping where speed and cost matter more than maximum fidelity.

image

Input

$2.50/M

Output

$8.00/M

InternLM3 8B

Shanghai AI Lab

Latest InternLM series model. Efficient for research and application development.

Input

$0.07/M

Output

$0.14/M

Context

128K

Yi 1.5 6B

01.AI

Compact Yi model offering strong reasoning at minimal resource requirements.

Input

$0.06/M

Output

$0.12/M

Context

128K

StarCoder2 3B

BigCode

Compact code model trained on 4T+ tokens and 600+ languages from The Stack v2.

Input

$0.03/M

Output

$0.06/M

Context

16K

StableLM 2 1.6B

Stability AI

Lightweight language model for on-device inference and resource-constrained environments.

Input

$0.02/M

Output

$0.04/M

Context

FLUX.2 Klein 4B

Black Forest Labs

Fastest FLUX model generating and editing images in under one second. Fully open under Apache 2.0.

image

Input

$0.01/M

Output

$0.01/M

Gemma 2 2B

Google

Smallest Gemma 2 model for efficient text processing on consumer hardware.

Input

$0.02/M

Output

$0.04/M

Context

EXAONE 4.0 1.2B

LG AI Research

Ultra-compact Korean AI model for on-device and mobile deployment.

Input

$0.02/M

Output

$0.04/M

Context

Alibaba/Qwen

Smallest Qwen VL model for lightweight vision-language tasks on constrained hardware.

textimagevideo

Input

$0.04/M

Output

$0.12/M

Context

128K

DeepSeek-R1-Distill-Qwen-7B

DeepSeek

Distilled R1 reasoning into compact Qwen-based model. Exceptional at math and programming.

Input

$0.07/M

Output

$0.14/M

Context

128K

DeepSeek-R1-Distill-Llama-8B

DeepSeek

R1 reasoning distilled into Llama 3 architecture. Strong reasoning at minimal compute cost.

Input

$0.07/M

Output

$0.14/M

Context

128K

Ministral 3B

Mistral AI

Smallest Mistral model for edge computing and extremely resource-constrained deployments.

Input

$0.04/M

Output

$0.10/M

Context

128K

Phi-4-reasoning-plus

Microsoft

Enhanced reasoning model using 1.5x more tokens for higher accuracy on complex logical tasks.

Input

$0.07/M

Output

$0.14/M

Context

32K

Nemotron 3 Nano

NVIDIA

Hybrid Mamba-Transformer MoE with 4x higher throughput than predecessor. Open weights and training data.

Input

$0.04/M

Output

$0.08/M

Context

1.0M

Canary-1B-Flash

NVIDIA

Speed-optimized ASR model delivering 1000+ RTFx on Open ASR Leaderboard. Exceptional accuracy.

audio

Input

$0.0040/M

Output

$0.0040/M

Apple Intelligence 3B

Apple

On-device model optimized for Apple silicon with 2-bit quantization. Powers Siri and Apple Intelligence.

Input

Free/M

Output

Free/M

Context

Granite 3.1 2B

IBM

Compact enterprise model for edge deployment and lightweight business tasks.

Input

$0.03/M

Output

$0.06/M

Context

128K

Granite 3.2 2B

IBM

Small enterprise model with coding support for lightweight automation workflows.

Input

$0.03/M

Output

$0.06/M

Context

128K

SmolLM2 1.7B

Hugging Face

Compact LLM designed for on-device AI. Surprisingly capable for its tiny size.

Input

$0.01/M

Output

$0.02/M

Context

SmolLM2 360M

Hugging Face

Tiny but functional language model for extreme resource constraints and research.

Input

$0.0050/M

Output

$0.01/M

Context

Command R 7B

Cohere

Cohere's smallest Command model optimized for RAG, tool use, and multilingual enterprise applications.