GPTCrunch
Back to News
New Release

Gemini 3.1 Pro: Google Claims #1 on 12 of 18 Benchmarks

Google's Gemini 3.1 Pro achieves 94.3% on GPQA Diamond and 77.1% on ARC-AGI-2, more than doubling its predecessor's reasoning score.

GPTUni Team

February 19, 20262 min read

Google DeepMind launched Gemini 3.1 Pro on February 19th, 2026, claiming the #1 position on 12 of 18 tracked benchmarks. The model's 94.3% score on GPQA Diamond — a benchmark testing graduate-level reasoning — is the highest ever recorded by any model.

The most striking result is ARC-AGI-2, a benchmark designed to test general reasoning and abstraction. Gemini 3.1 Pro scores 77.1%, compared to just 31.1% for its predecessor Gemini 3 Pro. This 2.5x improvement suggests a qualitative leap in the model's ability to reason about novel problems, not just pattern-match on training data.

On coding, the model achieves 80.6% on SWE-bench Verified, placing it alongside Claude Opus 4.6 at the top of the leaderboard. The LiveCodeBench Pro Elo rating of 2887 is also a new high.

Pricing remains the same as Gemini 3 Pro: $2.00 per million input tokens for prompts under 200K tokens, scaling to $4.00 for longer contexts. The 1M token context window and 64K max output are inherited from the Gemini 3 architecture.

The release caps a strong January-February period for Google, which started with Gemini 3 Flash going generally available in early January — a model that delivered frontier-class performance at Flash-level pricing ($0.50/$3.00 per million tokens). Together, the two models give Google a compelling offering at both the price-performance and raw-capability ends of the market.

New Release