NVIDIA has recently released its new Nemotron 3 series, which combines the Mamba and Transformer architectures to efficiently handle long context windows while reducing resource consumption. The Nemotron 3 series is designed for agent-based artificial intelligence systems, which can autonomously perform complex tasks and engage in long-term interactions.
The new product series includes three models: Nano, Super, and Ultra. The current Nano model is now officially available, while the Super and Ultra are expected to be launched in the first half of 2026. In this release, NVIDIA broke away from traditional pure Transformer architectures, adopting a hybrid architecture that combines efficient Mamba layers with Transformer elements and Mixture of Experts (MoE) technology. Compared to traditional pure Transformer models, Nemotron 3 handles long input sequences more effectively while maintaining stable memory usage.
Nemotron 3 supports a context window of up to one million tokens, matching cutting-edge models like OpenAI and Google. It can store large amounts of information, such as entire codebases or long conversation histories, without placing excessive pressure on hardware. The Nano model has 31.6 billion parameters, but only 3 billion parameters are active at each processing step. According to benchmark tests by the Artificial Intelligence Analytics Index (AII), Nemotron 3 matches the accuracy of gpt-oss-20B and Qwen3-30B and performs better in token throughput.
NVIDIA also introduced two key architectural improvements for the more powerful Super and Ultra models. The first is LatentMoE, which aims to address memory bandwidth overhead in standard MoE models by projecting tokens into compressed latent representations before processing. The second improvement is Multi-Token Prediction (MTP) technology, which predicts multiple tokens simultaneously during training, thereby improving text generation speed and logical reasoning capabilities.
In addition, NVIDIA has released the weights, training schemes, and multiple datasets for the Nano model, including Nemotron-CC-v2.1 based on Common Crawl, providing strong support for developers. This release aligns with NVIDIA's strategy to develop smaller language models, prioritizing speed over raw performance.
Key Points:
🌟 The Nemotron 3 series combines Mamba and Transformer architectures to enhance AI agent efficiency.
🚀 The Nano model is now available, while the Super and Ultra are expected to launch in the first half of 2026.
📊 NVIDIA released model weights and training datasets to help developers innovate.
