Luma AI shipped Uni-1 on March 23, 2026 โ€” a new image model that works nothing like the diffusion systems that have dominated the field for the past few years.

A different architecture

Diffusion models (Midjourney, Stable Diffusion, Google's Imagen) start with random noise and iteratively clean it into a picture. Uni-1 takes a completely different approach: it is a decoder-only autoregressive transformer that generates an image token-by-token, exactly the way a large language model writes a paragraph of text. According to Luma, this lets the model "think and generate pixels simultaneously," reasoning about spatial relationships and creative intent before each token lands.

Multi-reference input

Users can feed Uni-1 up to nine reference images at once, each assigned a specific role โ€” character, lighting, composition, style, or environment. The model uses those references to compose a new scene while respecting all constraints simultaneously. Early community tests show strong character consistency across generated frames, which has historically been a weak point for diffusion-based systems.

How to use it now

Uni-1 is live inside Luma Agents at lumalabs.ai. To make sure requests route to it, select Create Image โ†’ Uni-1 or explicitly ask the agent by name. Luma says API access is coming soon for developers who want to integrate the model directly.

Why it matters

Most image models are evaluated on single-image quality. Uni-1's architecture is a bet that structured reasoning before generation produces more controllable results โ€” closer to art direction than a lucky diffusion lottery. Whether the benchmark numbers hold up under broader testing remains to be seen, but the architecture shift itself is real and worth tracking.