At the 2026 Mobile Cloud Conference, China Mobile officially launched the Mobile Model Management Platform - MoMA. The launch of this platform marks that large model applications are accelerating from the "laboratory" to "every industry", aiming to make AI as accessible as water and electricity.

One-stop Integration: Over 300 Popular Models Ready to Use

The core advantage of the MoMA platform lies in its strong integration capability. Through a unified API gateway, users need only one access point to call over 300 industry-leading models, including China Mobile's self-developed "Jiu Tian" base large model. Whether it is domestic stars such as DeepSeek, Tongyi Qianwen, Kimi, or resources such as Douba, GLM, they have all been fully integrated.

This integration model covers a comprehensive range of capabilities including text generation, speech processing, and multimodal understanding, which can accurately match complex business scenarios in finance, education, healthcare, etc.

Intelligent Scheduling: Say Goodbye to "Choice Difficulties" in Model Calls

To solve the pain points of enterprises switching between different models, MoMA has pioneered the Intelligent Routing Engine. The system can automatically identify user needs and flexibly switch between three strategies: "cost-first," "effect-first," and "balance-first."

Notably, the platform provides high business continuity assurance: when a model fails or is throttled, MoMA can achieve sub-second automatic switching. Meanwhile, combined with the self-developed reasoning engine based on domestic computing power and technologies such as intelligent caching and context reuse, the cost per token is reduced by more than 30%, and resource utilization is reduced by over 50%.

Secure Foundation: First to Offer "Confidential Model" Service

For industries such as government and finance, where data privacy is extremely sensitive, MoMA has introduced the "Confidential Model" service. By deploying the model in confidential containers using hardware isolation technology, the calculation process achieves "usable but not visible," ensuring end-to-end data security from chip to application.

Closed-loop Operations: Transparent Computing Power Consumption

In terms of operations, MoMA introduces a centralized management model, achieving precise monitoring throughout the entire lifecycle of Tokens. The platform supports streaming real-time billing, with the bill generated by user usage delayed by no more than 1 minute, truly realizing "pay as you go."

Additionally, the full-chain observability allows developers to monitor key metrics such as latency, throughput, and GPU resource usage in real time. This clear loss record and risk control mechanism not only prevents resource contention but also provides an intuitive decision-making basis for enterprises' AI investment return on investment.