GPTCrunch
Back to News
New Release

Qwen3.5 397B Arrives: Alibaba's MoE Model Challenges the Frontier

Alibaba's Qwen team releases a 397B-parameter Mixture-of-Experts model with 256K context and open weights, scoring 88.4 on GPQA Diamond.

GPTUni Team

February 16, 20261 min read

Alibaba's Qwen team has released Qwen3.5 397B, a Mixture-of-Experts model that uses 17 billion active parameters out of 397 billion total. The architecture splits computation across 512 experts, activating only a small subset per token, which keeps inference costs low despite the model's enormous capacity.

On benchmarks, Qwen3.5 397B posts an 88.4 on GPQA Diamond, a 76.4 on SWE-bench Verified, and an 83.6 on LiveCodeBench. These numbers place it alongside the best proprietary models available today. The model also supports a 256K-token context window, making it suitable for processing long documents, codebases, and multi-turn conversations.

Notably, the weights are available under an open license. Developers can run the model locally or deploy it through providers like OpenRouter, Together AI, and Fireworks. At $0.15 per million input tokens through OpenRouter, it offers frontier-class performance at mid-tier pricing.

The release signals a broader shift in the industry: large MoE architectures are becoming the default for scaling model capability without proportionally scaling inference costs. Meta, Mistral, and now Alibaba have all adopted this approach, and the results speak for themselves.

New Release