Related papers: AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?

AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?

URL: http://arxiv.org/abs/2602.20664v1
Date: Tue, 24 Feb 2026 08:14:24 GMT
Title: AnimeAgent: Is the Multi-Agent via Image-to-Video models a Good Disney Storytelling Artist?
Authors: Hailong Yan, Shice Liu, Tao Wang, Xiangtao Zhang, Yijie Zhong, Jinwei Chen, Le Zhang, Bo Li,
Abstract summary: AnimeAgent is an Image-to-Video (I2V)-based multi-agent framework for Custom Storyboard Generation.<n>Inspired by Disney's "Combination of Straight Ahead and Pose to Pose" workflow, AnimeAgent leverages I2V's implicit motion prior to enhance consistency and expressiveness.<n> Experiments show AnimeAgent achieves SOTA performance in consistency, prompt fidelity, and stylization.
Score: 17.734200530216977
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Custom Storyboard Generation (CSG) aims to produce high-quality, multi-character consistent storytelling. Current approaches based on static diffusion models, whether used in a one-shot manner or within multi-agent frameworks, face three key limitations: (1) Static models lack dynamic expressiveness and often resort to "copy-paste" pattern. (2) One-shot inference cannot iteratively correct missing attributes or poor prompt adherence. (3) Multi-agents rely on non-robust evaluators, ill-suited for assessing stylized, non-realistic animation. To address these, we propose AnimeAgent, the first Image-to-Video (I2V)-based multi-agent framework for CSG. Inspired by Disney's "Combination of Straight Ahead and Pose to Pose" workflow, AnimeAgent leverages I2V's implicit motion prior to enhance consistency and expressiveness, while a mixed subjective-objective reviewer enables reliable iterative refinement. We also collect a human-annotated CSG benchmark with ground-truth. Experiments show AnimeAgent achieves SOTA performance in consistency, prompt fidelity, and stylization.

Related papers

DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning [24.808926786222376]
We present DreamActor-M2, a universal animation framework that reimagines motion conditioning as an in-context learning problem.<n>Our approach follows a two-stage paradigm. First, we bridge the input modality gap by fusing reference appearance and motion cues into a unified latent space.<n>Second, we introduce a self-bootstrapped data synthesis pipeline that curates pseudo cross-identity training pairs.
arXiv Detail & Related papers (2026-01-29T13:43:17Z)
AniMaker: Multi-Agent Animated Storytelling with MCTS-Driven Clip Generation [50.63646953706144]
We introduce AniMaker, a framework enabling efficient multi-candidate clip generation and storytelling-aware clip selection.<n>AniMaker achieves superior quality as measured by popular metrics including VBench and our proposed AniEval framework.
arXiv Detail & Related papers (2025-06-12T10:06:21Z)
AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation [52.655400705690155]
AnimeShooter is a reference-guided multi-shot animation dataset.<n>Story-level annotations provide an overview of the narrative, including the storyline, key scenes, and main character profiles with reference images.<n>Shot-level annotations decompose the story into consecutive shots, each annotated with scene, characters, and both narrative and descriptive visual captions.<n>A separate subset, AnimeShooter-audio, offers synchronized audio tracks for each shot, along with audio descriptions and sound sources.
arXiv Detail & Related papers (2025-06-03T17:55:18Z)
A Multi-Agent Framework for Automated Qinqiang Opera Script Generation Using Large Language Models [2.919625687404969]
This paper introduces a novel multi-Agent framework that automates the end to end production of Qinqiang opera by integrating Large Language Models, visual generation, and Text to Speech synthesis.<n>In a case study on Dou E Yuan, the system achieved expert ratings of 3.8 for script fidelity, 3.5 for visual coherence, and 3.8 for speech accuracy-culminating in an overall score of 3.6, a 0.3 point improvement over a Single Agent baseline.
arXiv Detail & Related papers (2025-04-22T03:14:29Z)
VideoGen-of-Thought: Step-by-step generating multi-shot video with minimal manual intervention [70.61101071902596]
Current video generation models excel at short clips but fail to produce cohesive multi-shot narratives due to disjointed visual dynamics and fractured storylines.<n>We introduce VideoGen-of-Thought (VGoT), a step-by-step framework that automates multi-shot video synthesis from a single sentence.<n>VGoT generates multi-shot videos that outperform state-of-the-art baselines by 20.4% in within-shot face consistency and 17.4% in style consistency.
arXiv Detail & Related papers (2025-03-19T11:59:14Z)
AniDoc: Animation Creation Made Easier [54.97341104616779]
Our research focuses on reducing the labor costs in the production of 2D animation by harnessing the potential of increasingly powerful AI.<n>AniDoc emerges as a video line art colorization tool, which automatically converts sketch sequences into colored animations.<n>Our model exploits correspondence matching as an explicit guidance, yielding strong robustness to the variations between the reference character and each line art frame.
arXiv Detail & Related papers (2024-12-18T18:59:59Z)
StoryAgent: Customized Storytelling Video Generation via Multi-Agent Collaboration [88.94832383850533]
We propose a multi-agent framework designed for Customized Storytelling Video Generation (CSVG) StoryAgent decomposes CSVG into distinct subtasks assigned to specialized agents, mirroring the professional production process. Specifically, we introduce a customized Image-to-Video (I2V) method, LoRA-BE, to enhance intra-shot temporal consistency. Our contributions include the introduction of StoryAgent, a versatile framework for video generation tasks, and novel techniques for preserving protagonist consistency.
arXiv Detail & Related papers (2024-11-07T18:00:33Z)
Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image. Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.