AMD has announced Day-0 support for Alibaba Qwen Qwen 3.5, its latest open-weight large language model, across AMD Instinct MI300X, MI325X, and MI35X GPUs. The enablement arrives in close collaboration with the Qwen team and ships fully optimized through the ROCm software stack, allowing developers to deploy the model immediately at production scale.

Designed for Long-Context AI and Enterprise-Scale Workloads
Qwen 3.5 targets long-context reasoning and multimodal workflows, supporting context windows up to 256K tokens. To avoid the quadratic scaling limits of traditional Transformers, the model introduces a Hybrid Attention design that alternates full multi-head attention with linear attention layers. This approach preserves recall while reducing compute overhead as sequences grow.
At the core of linear scaling are Gated Delta Networks, which keep complexity proportional to sequence length. Inference throughput improves notably beyond 32K tokens, a range where many dense models slow down.
Ultra-Sparse MoE Design Reduces Compute Overhead
Qwen 3.5 advances Mixture-of-Experts with a Shared Expert path that processes every token for stability, alongside Top-K routed experts that activate only a subset of specialists during inference. This ultra-sparse design delivers dense-model-level quality while using far less compute—an efficient match for Instinct GPUs in cost-sensitive enterprise deployments.
Native Multimodal AI Capabilities for Visual Workflows
The model is multimodal by design, integrating a DeepStack Vision Transformer and 3D convolutions to treat video as a temporal dimension. By merging features from multiple visual encoder layers, Qwen 3.5 captures both fine detail and high-level context. These capabilities enable “visual agent” use cases such as object identification in complex environments.
Optimized out of the box with SGLang and vLLM
AMD delivers Day-0 performance through SGLang and vLLM. Linear attention runs via Triton-based kernels on ROCm, Shared Expert paths leverage optimized hipBLASLt GEMMs, and vision components rely on standard MIOpen and PyTorch kernels. Large HBM capacity on MI300X/MI325X/MI35X lets teams host full-scale models and massive contexts on a single GPU or node, simplifying deployment.
With Day-0 support, AMD positions Qwen 3.5 as a production-ready, open-weight alternative optimized for its data-center accelerators. The move strengthens AMD’s AI stack for developers building long-context reasoning systems, multimodal agents, and enterprise platforms—without forcing trade-offs between scale, speed, and cost.
