Nvidia Releases Nemotron 3 Nano Omni: An Open Multimodal Model with 30B Params, 3B Active for Edge AI Agents
Nvidia is no longer just selling the shovels. Nemotron 3 Nano Omni is the company’s most aggressive move into AI models.
Summary
On April 28, 2026, Nvidia released Nemotron 3 Nano Omni, an open-weight multimodal AI model that unifies vision, audio, and language understanding in a single architecture designed for edge devices. The model boasts:
- 30 billion parameters
- 3 billion active parameters per inference through a mixture-of-experts design
- 9x higher throughput than comparable open models
- 2.9x faster single-stream reasoning on multimodal tasks
- 9x greater effective system capacity for video reasoning
This model tops six benchmarks across document intelligence, video understanding, and audio comprehension, processing text, images, audio, video, documents, charts, and graphical interfaces as inputs and producing text as output.
Available under Nvidia’s Open Model Agreement with full commercial use rights on Hugging Face, this release marks a significant step for Nvidia from infrastructure to AI model development.
The Architecture
Nemotron 3 Nano Omni employs a hybrid Mamba-Transformer architecture:
- 23 Mamba-2 selective state-space layers
- 23 mixture-of-experts layers with 128 experts routing to six per token plus a shared expert
- Six grouped-query attention layers
The design prioritizes maximizing capability per active parameter for edge deployment, where compute is limited per inference step.