Alibaba desafía a OpenAI y Google con Qwen 3.5

Alibaba desafía a OpenAI y Google con Qwen 3.5

The Chinese company unveiled a family of open-source AI models ranging from 0.8 to 9 billion parameters, capable of running on laptops and smartphones. These models boast reasoning and multimodal capabilities that, according to their benchmarks, rival those of much larger systems from OpenAI and Google. This move marks a strategic shift towards efficient, private AI that is independent of the cloud.

While giants like OpenAI, Anthropic, and Google focus their efforts on increasingly large and powerful foundational models, Alibaba has decided to take a different path. And it has done so with a proposal that could be a game-changer: small, open artificial intelligence models optimized to run on local devices like laptops and mobile phones.

The Chinese company has just unveiled its new Qwen 3.5 Small Models family, a series of small language models with ambitious aspirations. In a context where the most advanced models easily exceed hundreds of billions of parameters, Alibaba is betting on something radically different: efficiency, accessibility, and local execution.

The allure of tiny models

The new family comprises four variants: Qwen3.5-0.8B (800 million parameters), 2B, 4B, and 9B. To put this in perspective, the most recent models from major tech companies—like those from OpenAI or Google—are estimated to have around 500 billion parameters or even more, according to industry estimates.

The difference in scale is enormous. However, instead of seeing this as a disadvantage, Alibaba turns it into its main selling point: these models are designed to run on modest hardware, with low power consumption and without requiring a constant cloud connection.

The two smallest models, 0.8B and 2B, are geared towards prototyping and devices with very limited resources. This makes them ideal candidates for developers who want to integrate AI into edge solutions, IoT devices, or mobile applications with strict battery requirements.

Multimodality and efficiency in less than 3 GB

One of the most striking releases is Qwen3.5-4B, a multimodal model capable of working with text and images. It supports a context window of up to 262,144 tokens, a surprising figure for a model of this size.

In its 4-bit quantized version, the model occupies less than 3 GB. This means it can run directly on a modern smartphone or a conventional laptop without the need for high-end GPUs. The ability to integrate a multimodal model into a local environment, even within a browser, opens the door to new use cases where privacy and autonomy are key.

But the true star of the family is the largest model: Qwen3.5-9B.

A small model that competes with giants

The Qwen3.5-9B is focused on advanced reasoning tasks. According to benchmarks published by Alibaba, this model not only competes with much larger alternatives, but in some cases, it surpasses them.

In particular, the company claims that the 9B model achieves better results than OpenAI’s gpt-oss-120B, an open-source model that is approximately 13.5 times larger. In tests like GPQA (focused on complex reasoning), the performance of Alibaba’s model is especially competitive.

In the MMMU-Pro visual reasoning test, Qwen3.5-9B also reportedly outperformed Gemini 2.5 Flash Lite, reinforcing the idea that size is no longer the only indicator of performance.

Furthermore, all models in the Qwen 3.5 family are open weights and are available on platforms such as Hugging Face and ModelScope, facilitating their adoption by developers and researchers.

A New Architecture to Overcome the “Memory Wall”

Part of the leap in efficiency is due to changes in architecture. Alibaba has implemented what it calls an Efficient Hybrid Architecture, combining Gated Delta Networks—a new approach to attention mechanisms—with the well-known Mixture-of-Experts (MoE).

This design seeks to solve one of the major challenges of small models: the so-called “memory wall.” In traditional models, memory limitations often restrict the ability to scale context and performance. With this hybrid approach, Alibaba optimizes resource usage without increasing consumption.

The result is a balance between reasoning ability, computational efficiency, and low hardware requirements.

Source: www.itsitio.com