Alibaba's Qwen Team Unveils FP8 Builds of Qwen3-Next-80B-A3B for Enhanced AI Performance

Alibaba’s Qwen team has made a significant advancement in artificial intelligence by releasing FP8-quantized checkpoints for its latest Qwen3-Next-80B-A3B models. These models come in two post-training variants: Instruct and Thinking, designed specifically for high-throughput inference with ultra-long context and efficiency in Mixture-of-Experts (MoE) architecture.

Overview of the New Release

The FP8 repositories are structured to mirror the previous BF16 releases while incorporating "fine-grained FP8" weights with a block size of 128. Additionally, deployment notes are provided for both sglang and vLLM nightly builds. Although benchmarks for the new models correspond to those of the original BF16 models, the FP8 quantization is intended "for convenience and performance," rather than presenting a separate evaluation run.

Architecture Details

The Qwen3-Next-80B-A3B model features a hybrid architecture that integrates Gated DeltaNet—acting as a linear/conv-style attention surrogate—with Gated Attention. This structure is interleaved with an ultra-sparse Mixture-of-Experts (MoE) framework. The model utilizes a total parameter budget of 80 billion, activating approximately 3 billion parameters per token through a system of 512 experts, which includes 10 routed and 1 shared expert.

Total Layers: 48, arranged into 12 blocks
Configuration: 3×(Gated DeltaNet → MoE) followed by 1×(Gated Attention → MoE)
Native Context: 262,144 tokens, validated up to approximately 1,010,000 tokens using RoPE scaling (YaRN)
Hidden Size: 2048
Attention Mechanism: 16 Q heads and 2 KV heads, each with a head dimension of 256

This innovative architecture is poised to enhance the capabilities of AI applications, enabling more efficient processing and reducing latency for users across various industries.

Rocket Commentary

Alibaba's release of FP8-quantized checkpoints for its Qwen3-Next-80B-A3B models marks an important step in optimizing AI models for efficiency and performance. This advancement, particularly in high-throughput inference and ultra-long context capabilities, holds promise for sectors that require rapid decision-making based on vast datasets. However, as the industry leans into such innovations, it is crucial to ensure that accessibility remains a core focus. While the performance gains are significant, the implications for ethical use in AI applications cannot be overlooked. The technology should empower a wider range of businesses and developers, fostering an ecosystem where ethical considerations guide the deployment of advanced AI. As we move forward, the challenge will be to balance performance enhancements with responsible AI practices, ensuring that these powerful tools benefit society as a whole rather than a select few.

Alibaba's Qwen Team Unveils FP8 Builds of Qwen3-Next-80B-A3B for Enhanced AI Performance

Overview of the New Release

Architecture Details

Rocket Commentary

Read the Original Article

Explore More Topics