Input a Scene Script and Generate a 1-Minute Cohesive Narrative Video in Seconds! ByteDance Open-Sources StoryMem to Keep AI Video Characters' Faces Constant

Recently, the open-source framework StoryMem, jointly developed by ByteDance and Nanyang Technological University, has attracted widespread attention in the field of AI video generation. This framework uses an innovative "visual memory" mechanism to transform existing single-shot video diffusion models into multi-shot long video storytellers. It can automatically generate videos lasting more than one minute, containing multiple shot transitions, with highly coherent characters and scenes. This marks a key step forward for open-source AI video technology towards cinematic storytelling.

Core Innovation of StoryMem: Memory Mechanism Driven Shot-by-Shot Generation

The core of StoryMem lies in introducing a "Memory-to-Video (M2V)" design inspired by human memory. It maintains a compact dynamic memory library that stores key frame information from previously generated shots. First, the initial shot is generated using a text-to-video (T2V) module as the initial memory. Then, for each new shot generated, the M2V LoRA injects the memory key frames into the diffusion model, ensuring high consistency in character appearance, scene style, and narrative logic across shots.

After generation, the framework automatically extracts semantic key frames and performs aesthetic screening to further update the memory library. This iterative generation approach avoids common issues such as character "face changes" and scene jumps in traditional long video models, while requiring only lightweight LoRA fine-tuning without the need for large-scale long video data training.

Outstanding Consistency and Cinematic Quality

Experiments show that StoryMem significantly outperforms existing methods in cross-shot consistency, with an improvement of up to 29%, and receives higher preference in human subjective evaluation. At the same time, it retains the high-quality visuals, prompt adherence, and shot control capabilities of the base model (such as Wan2.2), supporting natural transitions and custom story generation.

The framework also released the ST-Bench benchmark dataset, containing 300 diverse multi-shot story prompts, used for standardized evaluation of long video narrative quality.

Wide Application Scenarios: A Quick Preview and A/B Testing Tool

StoryMem is particularly suitable for fields that require rapid iteration of visual content:

- Marketing and Advertising: Quickly generate dynamic storyboards from scripts for various A/B testing versions

- Film Pre-production: Assist crews in visualizing storyboards and reducing pre-concept costs

- Short Videos and Independent Creation: Easily produce coherent narrative short films, enhancing the professionalism of content

Community Rapid Response: ComfyUI Integration is Emerging

Soon after the project's release, the community has started exploring local deployment. Some developers have already implemented a preliminary workflow in ComfyUI, supporting local generation of long videos, further lowering the usage barrier.

AIbase's View: Long video consistency has always been a pain point in AI generation. StoryMem solves this problem in a lightweight and efficient way, greatly advancing the evolution of open-source video models into practical narrative tools. With the integration of more multimodal capabilities, its potential in advertising, film, and content creation will be further unleashed.

Project Address: https://github.com/Kevin-thu/StoryMem

Input a Scene Script and Generate a 1-Minute Cohesive Narrative Video in Seconds! ByteDance Open-Sources StoryMem to Keep AI Video Characters' Faces Constant

Related Recommendations

Dou Bao Rises to the Top of Apple App Store Free Chart, Previously Achieved 1.9 Billion Interactions with the Spring Festival Gala

Seedance 2.0 Officially Released: Unified Multimodal Architecture, 5-Second Audio-Visual Integration, Directing Industrial-Level Creation

The Spring Festival Box Office is Booming in the AI World! ByteDance, Zhipu, and MiniMAX Are Rolling Out New Products, with Multiple Large Models Engaging in a Heavenly Battle

ByteDance's Self-Developed Chip Exposed: 100,000 Units in Mass Production Imminent, Aiming to Break NVIDIA Dependence

ByteDance vs. Kuaishou: Seedance 2.0 Advances with Director-Level Control, Related Stock Prices Surge 20%