More Than Just Generating Videos: Google Veo3 Makes a Stunning Debut, Automatically Playing Sudoku and Solving Mazes

According to the latest disclosure from Google's research department, its video generation model Veo3 has made breakthrough progress in the field of visual AI, hailed as reaching the "GPT-3 moment." After a series of tests on Veo3, researchers found that the model is not limited to video generation and can automatically complete multiple complex visual tasks without additional training.

When tested with 18,384 of the simplest video generation tasks, Veo3 demonstrated remarkable versatility, including object searching, photo repairing, maze solving, and Sudoku solving. Specifically, Veo3 can:

Understand images: Automatically identify basic visual elements in images, such as edges, contours, object positions, colors, and shapes.
Understand physical principles: Possess basic physical cognition, for example, it can distinguish between objects that float and those that sink, and understand how light reflects.
Perform manual editing: Like an "automatic Photoshop," Veo3 can perform complex image editing tasks, such as removing backgrounds, adding text, or converting photos into oil painting styles.
Have "rational" abilities: When facing a maze image, it can independently plan and draw a path through the maze.

Google's research department believes that this breakthrough of Veo3 marks a new stage in the development of the visual AI field, with its universality and autonomous task-solving capabilities comparable to GPT-3 in the natural language processing field.

Three-Year Delayed Long Article: Former OpenAI Security VP Wang Li Analyzes Scaling Laws: Your Model May Have Been Trained on the Wrong Data

Lilian Weng returns with a deep dive into scaling laws, arguing the industry consensus may be reversed: from Kaplan to Chinchilla, the mainstream data allocation might not be optimal. It examines compute, model size, and data quantity trade-offs, implying the billions-invested path requires reconsideration, prompting a re-evaluation of pretraining recipes.....

Cost per Second Drops by Half, ByteDance Releases Seedance 2.0 Mini Video Generation Model

ByteDance's Volcano Engine launched Seedance 2.0 Mini, a cost-effective video generation model on the Volcano Ark Experience Center, with API services coming soon. This lightweight version offers faster generation than the standard model while maintaining high quality, targeting broader video creation and scalable production markets.....

More Than Just Generating Videos: Google Veo3 Makes a Stunning Debut, Automatically Playing Sudoku and Solving Mazes

Related Recommendations

Three-Year Delayed Long Article: Former OpenAI Security VP Wang Li Analyzes Scaling Laws: Your Model May Have Been Trained on the Wrong Data

Cost per Second Drops by Half, ByteDance Releases Seedance 2.0 Mini Video Generation Model

Qwen App Launches Wan2.7 Video Model: Edit Videos and Continue Actions with Just a Few Words

ByteDance Halts Global Launch of Seedance 2.0, Legal and Regulatory Challenges Force AI Video Model Delay

Google NotebookLM Launches New Cinematic Video Overview Feature