AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?
- URL: http://arxiv.org/abs/2602.20664v1
- Date: Tue, 24 Feb 2026 08:14:24 GMT
- Title: AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?
- Authors: Hailong Yan, Shice Liu, Tao Wang, Xiangtao Zhang, Yijie Zhong, Jinwei Chen, Le Zhang, Bo Li,
- Abstract summary: AnimeAgent is an Image-to-Video (I2V)-based multi-agent framework for Custom Storyboard Generation.<n>Inspired by Disney's "Combination of Straight Ahead and Pose to Pose" workflow, AnimeAgent leverages I2V's implicit motion prior to enhance consistency and expressiveness.<n> Experiments show AnimeAgent achieves SOTA performance in consistency, prompt fidelity, and stylization.
- Score: 17.734200530216977
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Custom Storyboard Generation (CSG) aims to produce high-quality, multi-character consistent storytelling. Current approaches based on static diffusion models, whether used in a one-shot manner or within multi-agent frameworks, face three key limitations: (1) Static models lack dynamic expressiveness and often resort to "copy-paste" pattern. (2) One-shot inference cannot iteratively correct missing attributes or poor prompt adherence. (3) Multi-agents rely on non-robust evaluators, ill-suited for assessing stylized, non-realistic animation. To address these, we propose AnimeAgent, the first Image-to-Video (I2V)-based multi-agent framework for CSG. Inspired by Disney's "Combination of Straight Ahead and Pose to Pose" workflow, AnimeAgent leverages I2V's implicit motion prior to enhance consistency and expressiveness, while a mixed subjective-objective reviewer enables reliable iterative refinement. We also collect a human-annotated CSG benchmark with ground-truth. Experiments show AnimeAgent achieves SOTA performance in consistency, prompt fidelity, and stylization.
Related papers
- DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning [24.808926786222376]
We present DreamActor-M2, a universal animation framework that reimagines motion conditioning as an in-context learning problem.<n>Our approach follows a two-stage paradigm. First, we bridge the input modality gap by fusing reference appearance and motion cues into a unified latent space.<n>Second, we introduce a self-bootstrapped data synthesis pipeline that curates pseudo cross-identity training pairs.
arXiv Detail & Related papers (2026-01-29T13:43:17Z) - AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation [50.63646953706144]
We introduce AniMaker, a framework enabling efficient multi-candidate clip generation and storytelling-aware clip selection.<n>AniMaker achieves superior quality as measured by popular metrics including VBench and our proposed AniEval framework.
arXiv Detail & Related papers (2025-06-12T10:06:21Z) - AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation [52.655400705690155]
AnimeShooter is a reference-guided multi-shot animation dataset.<n>Story-level annotations provide an overview of the narrative, including the storyline, key scenes, and main character profiles with reference images.<n>Shot-level annotations decompose the story into consecutive shots, each annotated with scene, characters, and both narrative and descriptive visual captions.<n>A separate subset, AnimeShooter-audio, offers synchronized audio tracks for each shot, along with audio descriptions and sound sources.
arXiv Detail & Related papers (2025-06-03T17:55:18Z) - A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models [2.919625687404969]
This paper introduces a novel multi-Agent framework that automates the end to end production of Qinqiang opera by integrating Large Language Models, visual generation, and Text to Speech synthesis.<n>In a case study on Dou E Yuan, the system achieved expert ratings of 3.8 for script fidelity, 3.5 for visual coherence, and 3.8 for speech accuracy-culminating in an overall score of 3.6, a 0.3 point improvement over a Single Agent baseline.
arXiv Detail & Related papers (2025-04-22T03:14:29Z) - VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention [70.61101071902596]
Current video generation models excel at short clips but fail to produce cohesive multi-shot narratives due to disjointed visual dynamics and fractured storylines.<n>We introduce VideoGen-of-Thought (VGoT), a step-by-step framework that automates multi-shot video synthesis from a single sentence.<n>VGoT generates multi-shot videos that outperform state-of-the-art baselines by 20.4% in within-shot face consistency and 17.4% in style consistency.
arXiv Detail & Related papers (2025-03-19T11:59:14Z) - AniDoc: Animation Creation Made Easier [54.97341104616779]
Our research focuses on reducing the labor costs in the production of 2D animation by harnessing the potential of increasingly powerful AI.<n>AniDoc emerges as a video line art colorization tool, which automatically converts sketch sequences into colored animations.<n>Our model exploits correspondence matching as an explicit guidance, yielding strong robustness to the variations between the reference character and each line art frame.
arXiv Detail & Related papers (2024-12-18T18:59:59Z) - StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration [88.94832383850533]
We propose a multi-agent framework designed for Customized Storytelling Video Generation (CSVG)
StoryAgent decomposes CSVG into distinct subtasks assigned to specialized agents, mirroring the professional production process.
Specifically, we introduce a customized Image-to-Video (I2V) method, LoRA-BE, to enhance intra-shot temporal consistency.
Our contributions include the introduction of StoryAgent, a versatile framework for video generation tasks, and novel techniques for preserving protagonist consistency.
arXiv Detail & Related papers (2024-11-07T18:00:33Z) - Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image.
Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details.
We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.