NVIDIA has released Nemotron 3 Ultra, the largest tier in its Nemotron 3 model family, with model cards now live on Build.NVIDIA.com and Hugging Face. The release follows earlier Nemotron 3 Super and Nano Omni launches, but targets a different part of the stack: maximum-capability reasoning for demanding AI agent systems.

The model card lists 550 billion total parameters, with 55 billion active parameters during inference. NVIDIA describes the architecture as a hybrid Latent Mixture-of-Experts system using interleaved Mamba-2, MoE, and attention layers, plus Multi-Token Prediction layers intended to improve generation speed and quality.

What Shipped

Two public variants are listed: a BF16 model on Hugging Face and an NVFP4 version on both Hugging Face and Build.NVIDIA.com. NVIDIA says the Ultra model is optimized for complex multi-step agents, long-context analysis, high-accuracy reasoning over code, math, and science, multilingual reasoning, tool use, and high-stakes retrieval-augmented generation.

NVIDIA's Nemotron product page frames the broader family as open models for long-running, self-evolving agents. In that lineup, Nano is positioned for specialized sub-agents, Super for high-throughput reasoning and tool calling in multi-agent systems, and Ultra for workflows that need the strongest reasoning over multiple steps.

Why It Matters

The practical signal is that NVIDIA is filling out a tiered agent model stack rather than shipping isolated checkpoints. Developers can now choose between smaller perception or sub-agent models, a throughput-focused Super model, and an Ultra model intended for heavier planning and analytical work.

The conservative read is still important: NVIDIA's benchmark claims come from its own evaluations, and Ultra will require serious infrastructure. But the June 4 availability makes it a concrete option for teams testing large open models as the reasoning layer behind autonomous software agents.