Google Gemini New Feature: Users Can Guide AI Video Generation with Multiple Reference Images

Recently, Google updated the Gemini app, offering users a new AI video generation control method. Users can now upload multiple reference images in a single video prompt. The system will generate videos and audio based on these images and text, allowing users to have more direct control over the final video's appearance and sound.

Google had previously tested this feature in its extended video AI platform Flow. Flow not only supports expanding existing video clips and splicing multiple scenes but also offers higher video quotas than the Gemini app. According to Google, the Veo3.1 version released in mid-October shows significant improvements in texture realism, input fidelity, and audio quality compared to the Veo3.0 version.

With this update, users can more flexibly use AI tools to create content that better meets their needs. The ability to upload multiple reference images means creators can incorporate more personalized elements into video production, providing audiences with richer visual and auditory experiences.

In this era of rapid development in AI technology, Google's move demonstrates its continuous innovation in the field of video generation. As user demands become more diverse, the flexibility and customizability of AI tools are becoming increasingly important. Gemini's new features are undoubtedly attracting more creators' attention and usage.

Key Points:
🌟 Users can upload multiple reference images to guide AI in generating videos and audio.
🎥 This new feature enhances users' control over the final video effect.
🔊 The Veo3.1 version has noticeable improvements in video quality and audio experience compared to the previous version.

Kling AI Secures $3 Billion in Funding, Valuation Reaches $18 Billion, Setting a Record for Video AI Model Fundraising

On July 2, Kuaishou's Kling AI raised nearly $3 billion, reaching an $18 billion post-investment valuation and setting a global record for video large model funding. Co-led by CPE, Guofang Venture Capital, BlueFive, Tencent, Zhongguancun Science City Fund, and CITIC Securities, the round officially launches its independent commercialization.....

The Rumors of Qilin AI's Financing Resurface: The Capital Struggles Behind a $18 Billion Valuation

The AI video generation platform Qilin AI, under Kuaishou, has been reported to be about to complete its first independent round of financing, with a scale of $3 billion and a post-money valuation of around $18 billion. Although Kuaishou has not commented, the news has triggered high market attention. Since the split was first reported in May 2026, the platform's valuation expectations have remained in a state of fluctuation and negotiation.

xAI Launches Grok Imagine Video 1.5: Turn a Single Image into a Video in Seconds, Facing Competition from Google Veo

xAI released the preview version of Grok Imagine Video 1.5, entering the AI video generation market. The model can convert a single static image into a short video, supporting output at 720p resolution. After users upload an image, they can describe the camera movement, visual rhythm, and atmosphere through text prompts. The model retains the original image details, lighting, and style, generating a natural and smooth dynamic video.

Raised 2.5 Billion Yuan in 6 Months, Valuation of 1 Billion Dollars! The Growth Secrets of the Most Promising AI Entrepreneur from ByteDance - Wang Changhu

Wang Changhu, former head of visual technology at ByteDance, founded Ai Shi Technology. With an efficient team and rapid technological iteration, he secured 2.5 billion yuan in funding within six months, valuing the company at 1 billion dollars and making it a unicorn. His team broke through in the competitive AI video generation sector, demonstrating a 'speed and passion' style of entrepreneurial spirit, and has become a leading figure among ByteDance entrepreneurs.

Volcano Engine Seedance 2.0 Fully Opens API Services

Volcano Engine launches the Seedance 2.0 series of API services, offering advanced video generation technology that supports four input methods: text, images, audio, and video, with capabilities for multimodal content creation and editing. It is suitable for complex interactions and dynamic scenarios. The service aims to help enterprises and individual users optimize workflows, explore innovative applications, and ensure compliance and security in AI video creation.