Kunlun Tech: Multi-Modal Large Model Has Entered Experimental Training Phase

Kunlun Tech stated that the 'Tiangong' large model has been iterating on a weekly basis since its release, with the training cluster operating at high load. The mobile version of the Tiangong AI assistant has officially launched and entered the internal testing phase, available for both iOS and Android users to download and test. The company's large model supports text conversations exceeding ten thousand characters, allowing users to engage in over 20 rounds of interaction.

Alibaba Ovis-U1 Launches with a Bang: A Multi-Modal AI All-in-One, Open Source Empowers Global Developers

On June 29, 2025, the Alibaba International AI Team officially released the new multi-modal large model **Ovis-U1**, marking another major breakthrough in the field of multi-modal artificial intelligence. As the latest masterpiece of the Ovis series, Ovis-U1 integrates multi-modal understanding, image generation, and image editing functions, demonstrating powerful cross-modal processing capabilities, providing new possibilities for developers, researchers, and industry applications. This is a detailed report on Ovis-U1 by AIbase. Ovis-U1

Alibaba Launches Multi-Modal Large Model mPLUG-Owl3: Watch a 2-Hour Movie in 4 Seconds

The latest release from the Alibaba team, mPLUG-Owl3 is a general-purpose multi-modal large model, with its core capability being the understanding of long image sequences. By introducing a hyper attention module, mPLUG-Owl3 can efficiently process visual and language information, achieving in-depth understanding and communication of multi-modal data such as images and videos. This model has made significant breakthroughs in inference efficiency, image processing capabilities, and the application of multi-modal knowledge, particularly in video understanding, where it can 'watch' a 2-hour movie in 4 seconds and accurately answer related questions.

ByteDance's Depth Anything V2 Model Included in Apple's Core ML Model Library

ByteDance's large model team has achieved another success with their Depth Anything V2 model being incorporated into Apple's Core ML model library. This achievement not only represents a technical breakthrough but is also remarkable because the project leader is actually an intern. Depth Anything V2 is a monocular depth estimation model capable of estimating scene depth from a single image. This model has expanded from a 25M parameter size in its V1 version at the beginning of 2024 to 1.3B in th

EchoMimic: AI Project for Lip-Syncing Videos from Audio and Character Photos

In the AI video lip-syncing field, Ant Group and its related research teams have developed a new technology similar to Alibaba's Emo, which can generate vivid lip-synced videos based on audio content and character photos.Product Entry: https://top.aibase.com/tool/echomimic EchoMimic technology, with its innovative approach, overcomes the limitations of traditional audio-driven or facial landmark-driven methods, achieving more realistic and dynamic human image generation. Traditional methods ofte

OpenAI Introduces Text-to-Speech API in Developer Playground

OpenAI has added the Text-to-Speech API to its Developer Playground, making developers' work easier than ever. With just a simple text message input, developers can choose from six preset voices to generate audio.Better yet, this API automatically identifies the language of the text and matches it with the corresponding voice, eliminating the hassle of selecting language and country versions.This service not only simplifies the development process but also provides high-quality voice synthesis t