SenseTime officially released and open-sourced the SenseNova U1 series of native understanding and generation unified models on the 28th. This model is based on the self-developed NEO-unify architecture by SenseTime in March this year. It achieves deep integration of multi-modal understanding, reasoning, and generation within a single model framework, marking a significant breakthrough in the multi-modal AI paradigm from "integrated" to "native unification."

The NEO-unify architecture adopted by SenseNova U1 completely discards the common modular design found in mainstream models. By removing the visual encoder (VE) and variational autoencoder (VAE), it reconstructs a unified representation space. This architecture deeply integrates multi-modal processing into every layer of computation, allowing language and visual information to be modeled as a unified composite. It maintains pixel-level visual fidelity while preserving semantic richness. With this technology, the model demonstrates remarkable performance in logical reasoning and spatial intelligence, accurately understanding the complex layout and intricate relationships of the physical world.