News & Insights

Coverage of AI model releases, pricing shifts, and industry developments.

PricingFebruary 20, 20262 min read

Comparing Token Costs: What Does AI Actually Cost to Use?

A practical breakdown of what tokens mean in real terms — from a single email to processing an entire codebase.

Gemini 3.1 Pro: Google Claims #1 on 12 of 18 Benchmarks

Google's Gemini 3.1 Pro achieves 94.3% on GPQA Diamond and 77.1% on ARC-AGI-2, more than doubling its predecessor's reasoning score.

February 19, 20262 min

Benchmarks

SWE-bench Leaderboard: February 2026 Rankings

The latest SWE-bench Verified scores show Kimi K2.5 and Qwen3.5 tied near the top. Here is the full leaderboard breakdown.

February 18, 20262 min

New Release

DeepSeek V4 and the February 17th Mega-Launch

Five major model launches on a single day: DeepSeek V4, Claude Sonnet 4.6, Qwen 3.5, Grok 4.20, and Cohere Tiny Aya all ship on February 17th.

February 17, 20262 min

New Release

Claude Opus 4.6 and Sonnet 4.6: Anthropic's February Blitz

Anthropic releases its strongest model pair yet — Opus 4.6 hits 80.8% on SWE-bench and Sonnet 4.6 matches it at 1/5 the cost.

February 17, 20262 min

New Release

Qwen3.5 397B Arrives: Alibaba's MoE Model Challenges the Frontier

Alibaba's Qwen team releases a 397B-parameter Mixture-of-Experts model with 256K context and open weights, scoring 88.4 on GPQA Diamond.

February 16, 20261 min

Pricing

API Pricing in 2026: A Race to the Bottom or a New Equilibrium?

Input token prices have dropped 80% in 18 months. We analyze what this means for developers and the models competing on cost.

February 10, 20262 min

New Release

GPT-5.3-Codex: OpenAI Unifies Its Training Stacks

OpenAI's GPT-5.3-Codex is the first model combining Codex and GPT-5 training, scoring 77.3% on Terminal-Bench 2.0 and 81.4% on SWE-Lancer.

February 5, 20261 min

Industry

Open Source AI in 2026: The Gap Has Closed

Open-weight models now match proprietary alternatives on most benchmarks. We examine what changed and what it means for the industry.

February 5, 20262 min

New Release

Kimi K2.5: Moonshot AI Enters the Multimodal Frontier

Moonshot AI's Kimi K2.5 combines a 1T MoE architecture with native vision, scoring 76.8 on SWE-bench and 96.1 on AIME 2025.

January 27, 20261 min

New Release

Claude Opus 4 Sets New Benchmark Records

Anthropic's latest flagship model achieves state-of-the-art on SWE-bench with 72% score and introduces extended thinking capabilities.

May 22, 20251 min

New Release

Llama 4: Meta's Multimodal MoE Models Launch with Scout and Maverick

Meta releases two Llama 4 variants: Scout with 10M context and Maverick with 400B parameters, both using MoE architecture.

April 5, 20252 min

New Release

Gemini 2.5 Pro: Google's Thinking Model

Google releases Gemini 2.5 Pro with built-in reasoning capabilities and 1M context window.

March 25, 20252 min

New Release

DeepSeek R1: Open-Source Reasoning at Scale

DeepSeek releases R1, an open-source reasoning model that matches o1-level performance at less than $1 per million tokens.

January 20, 20252 min