According to the latest disclosure from Google's research department, its video generation model Veo3 has made breakthrough progress in the field of visual AI, hailed as reaching the "GPT-3 moment." After a series of tests on Veo3, researchers found that the model is not limited to video generation and can automatically complete multiple complex visual tasks without additional training.

When tested with 18,384 of the simplest video generation tasks, Veo3 demonstrated remarkable versatility, including object searching, photo repairing, maze solving, and Sudoku solving. Specifically, Veo3 can:

  • Understand images: Automatically identify basic visual elements in images, such as edges, contours, object positions, colors, and shapes.

  • Understand physical principles: Possess basic physical cognition, for example, it can distinguish between objects that float and those that sink, and understand how light reflects.

  • Perform manual editing: Like an "automatic Photoshop," Veo3 can perform complex image editing tasks, such as removing backgrounds, adding text, or converting photos into oil painting styles.

  • Have "rational" abilities: When facing a maze image, it can independently plan and draw a path through the maze.

Google's research department believes that this breakthrough of Veo3 marks a new stage in the development of the visual AI field, with its universality and autonomous task-solving capabilities comparable to GPT-3 in the natural language processing field.