Mistral AI released Mistral Small 4 on March 16, 2026 โ€” a 119B-parameter Mixture-of-Experts (MoE) model under the Apache 2.0 license that consolidates four previously separate models into one.

What It Is

Small 4 combines the roles of Mistral Small (instruction following), Magistral (reasoning), Pixtral (multimodal), and Devstral (agentic coding) into a single model. Despite the 119B total parameter count, only 6B parameters are active per token โ€” the architecture routes each query through 4 of 128 available expert modules.

Key Specs

  • 128 experts, 4 active per token
  • 256K context window
  • Text and image inputs, text output
  • Configurable reasoning via reasoning_effort parameter at inference time
  • 40% faster end-to-end and 3x more throughput vs Mistral Small 3

Why It Matters

The reasoning_effort toggle is the headline feature. Developers can set it to none for fast chat-style responses, or high for step-by-step reasoning comparable to the Magistral series โ€” eliminating the need to maintain separate fast and reasoning model deployments.

At reasoning_effort=high, Mistral claims Small 4 matches or beats the specialized Magistral models on internal benchmarks.

Availability

The model is available on Hugging Face, the Mistral API, and NVIDIA NIM containers (day-0 support). Minimum self-hosting target is 4x NVIDIA H100 or 2x H200. vLLM is the recommended inference stack; llama.cpp and SGLang support are listed as works in progress.

Mistral is also joining the NVIDIA Nemotron Coalition, a collaboration to advance open AI model development.