Intelligent Future Launches 200B-Parameter Native Multimodal Image Large Model, Embarking on a New Journey from Generating Content to Understanding the World

At the inaugural Open Day held in Beijing, ZhiXiang Future officially launched the image large model HiDream-O1-Image-Pro, built on the next-generation native multimodal model architecture - Unified Transformer (UiT). This native multimodal large model with over 200 billion parameters has set new SOTA (state-of-the-art) records in multiple authoritative benchmark tests. On the same day, ZhiXiang Future announced its second round of funding within half a month, backed by top institutions such as Shenzhen Capital and Jinpu Investment, once again demonstrating the high recognition of the capital market for the "native multimodal" technology path.

Key Technological Breakthroughs: From "Modality Stitching" to "Native Unity"

Currently, the visual generation field mostly adopts a fragmented stitching paradigm of "VAE + independent language model encoding", which is difficult to achieve breakthroughs in complex semantic understanding and detailed restoration. ZhiXiang Future achieved true "low-level representation fusion" by incorporating raw image pixels, text tokens, and task conditions into a unified continuous shared token space through the UiT architecture.

HiDream-O1-Image-Pro: This is a closed-source version of the model with over 200B parameters. It not only has top-tier text-to-image capabilities but also sets new industry benchmarks in complex text rendering, instruction editing, and multi-subject personalized generation.
Open Source Benchmark: Its 8B parameter version with the same architecture previously topped the open-source model list in the text-to-image category on the global evaluation platform Artificial Analysis, and it is the smallest parameter version among the top 20, fully demonstrating the outstanding scalability of the UiT architecture.

Strategic Focus: Building a World Model with "Native Multimodal"

Mei Tao, founder and CEO of ZhiXiang Future, pointed out that what is called "multimodal" in the industry is mostly "single-modal stitching," while ZhiXiang Future pursues "native multimodal." He believes that by embedding "the rules of the world" (spatial relationships, physical laws, causal logic) into the model architecture from the beginning, the model can truly evolve from "generating content" to "understanding the world, reasoning about the world, and reconstructing the world," which is an essential path to achieving AGI (Artificial General Intelligence).

Business Implementation: Dual-Powered by Model and Intelligent Agent

While deepening the underlying architecture, ZhiXiang Future has built a "1+1+3" business framework, driving commercial implementation through three core intelligent agent applications:

Commercial Marketing Intelligent Agent (HiBurst): It has become one of TikTok's top 5 official service providers, producing over a million e-commerce marketing videos annually, covering a GMV of over 100 million yuan.
AI Film and Television Creation Intelligent Agent ("FrameZan"): It has achieved end-to-end process integration from concept to final production, having produced over 5,000 minutes of short web series and attracting over 1,000 professional teams to join.
Social Media Creation Intelligent Agent (vivago): It supports end-to-end long thinking and minute-level story video generation, covering 40 million users in over 100 countries and regions worldwide.

Ecosystem Construction: The Industrial Path Toward AGI

At the Open Day event, ZhiXiang Future announced strategic cooperation with Shanghai Film New Vision Fund, Blue Sky Brand, Jiecheng Century, and Beier Health, accelerating the transformation of model capabilities into industry scenarios by deeply participating in sectors such as film and television creation, cross-border e-commerce, and healthcare.

From visual generation to building a world model, ZhiXiang Future is committed to enabling AI to understand different modalities' environmental states and predict changes through the "Imaging the World" vision, using a unified modeling framework. With the continuous influx of diverse capital and the rapid expansion of the commercial ecosystem, ZhiXiang Future is accelerating its transformation from a visual technology provider to a general world model builder.

Intelligent Future Launches 200B-Parameter Native Multimodal Image Large Model, Embarking on a New Journey from Generating Content to Understanding the World

Related Recommendations

ZhiXiang Future: Secures Two Rounds of Funding Within Two Weeks, Shocking Full-Modal Large Model with Over 200 Billion Parameters Makes Its Debut

The Era of AI Vocal Covers Has Arrived! Spotify Teams Up with Universal Music to Unlock a Top-Level Remix Experience for Music Fans

Generative AI Users Surpass 600 Million, Trust and Risk Awareness Are Shifting!

Pentagon Accelerates Moving Away from Anthropic: AI Supply Chain Diversification Shifts to a Super-User Driven Model

Cursor Back to the Peak! New Composer 2.5 Challenges Claude, Priced at Just One-Tenth!