Google DeepMind has released Gemma 4, its most capable open-weight model family yet, built on the same technology that powers Gemini 3. The release spans four models designed for everything from smartphones to datacenter GPUs.

The lineup

The family includes two large models - a 31B Dense and a 26B Mixture of Experts (activating only 3.8B parameters per pass for faster inference) - plus two edge-optimized models: E2B and E4B, targeting mobile devices and IoT hardware like Raspberry Pi 5.

All models ship with native function calling, structured JSON output, and agentic workflow support. Context windows reach 256K tokens for the larger models and 128K for edge variants, with multimodal capabilities covering text, images, and audio across 140+ languages.

Apache 2.0 changes the game

The biggest shift may be licensing. Google has dropped its restrictive custom Gemma license in favor of Apache 2.0, addressing long-standing developer complaints about unilateral rule changes and downstream enforcement requirements that made many hesitant to build on Gemma.

Performance claims

Google says the 31B Dense model debuts at number three on the Arena open-model leaderboard, behind GLM-5 and Kimi 2.5, while being a fraction of their size. The 26B MoE variant prioritizes speed, and the E2B model achieves 133 tokens per second prefill on a Raspberry Pi 5.

Gemma 4 weights are available now on Hugging Face, Kaggle, and Ollama, with interactive access through Google AI Studio and AI Edge Gallery.