FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
- URL: http://arxiv.org/abs/2506.21272v2
- Date: Fri, 27 Jun 2025 01:04:39 GMT
- Title: FairyGen: Storied Cartoon Video from a Single Child-Drawn Character
- Authors: Jiayi Zheng, Xiaodong Cun,
- Abstract summary: We propose FairyGen, an automatic system for generating story-driven cartoon videos from a single child's drawing.<n>Unlike previous storytelling methods, FairyGen explicitly disentangles character modeling from stylized background generation.<n>Our system produces animations that are stylistically faithful, narratively structured natural motion.
- Score: 15.701180508477679
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We propose FairyGen, an automatic system for generating story-driven cartoon videos from a single child's drawing, while faithfully preserving its unique artistic style. Unlike previous storytelling methods that primarily focus on character consistency and basic motion, FairyGen explicitly disentangles character modeling from stylized background generation and incorporates cinematic shot design to support expressive and coherent storytelling. Given a single character sketch, we first employ an MLLM to generate a structured storyboard with shot-level descriptions that specify environment settings, character actions, and camera perspectives. To ensure visual consistency, we introduce a style propagation adapter that captures the character's visual style and applies it to the background, faithfully retaining the character's full visual identity while synthesizing style-consistent scenes. A shot design module further enhances visual diversity and cinematic quality through frame cropping and multi-view synthesis based on the storyboard. To animate the story, we reconstruct a 3D proxy of the character to derive physically plausible motion sequences, which are then used to fine-tune an MMDiT-based image-to-video diffusion model. We further propose a two-stage motion customization adapter: the first stage learns appearance features from temporally unordered frames, disentangling identity from motion; the second stage models temporal dynamics using a timestep-shift strategy with frozen identity weights. Once trained, FairyGen directly renders diverse and coherent video scenes aligned with the storyboard. Extensive experiments demonstrate that our system produces animations that are stylistically faithful, narratively structured natural motion, highlighting its potential for personalized and engaging story animation. The code will be available at https://github.com/GVCLab/FairyGen
Related papers
- AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation [52.655400705690155]
AnimeShooter is a reference-guided multi-shot animation dataset.<n>Story-level annotations provide an overview of the narrative, including the storyline, key scenes, and main character profiles with reference images.<n>Shot-level annotations decompose the story into consecutive shots, each annotated with scene, characters, and both narrative and descriptive visual captions.<n>A separate subset, AnimeShooter-audio, offers synchronized audio tracks for each shot, along with audio descriptions and sound sources.
arXiv Detail & Related papers (2025-06-03T17:55:18Z) - DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds [64.53681498600065]
DreamDance is an animation framework capable of producing stable, consistent character and scene motion conditioned on precise camera trajectories.<n>We train a pose-aware video inpainting model that injects the dynamic character into the scene video while enhancing background quality.
arXiv Detail & Related papers (2025-05-30T15:54:34Z) - AniDoc: Animation Creation Made Easier [54.97341104616779]
Our research focuses on reducing the labor costs in the production of 2D animation by harnessing the potential of increasingly powerful AI.<n>AniDoc emerges as a video line art colorization tool, which automatically converts sketch sequences into colored animations.<n>Our model exploits correspondence matching as an explicit guidance, yielding strong robustness to the variations between the reference character and each line art frame.
arXiv Detail & Related papers (2024-12-18T18:59:59Z) - FlipSketch: Flipping Static Drawings to Text-Guided Sketch Animations [65.64014682930164]
Sketch animations offer a powerful medium for visual storytelling, from simple flip-book doodles to professional studio productions.
We present FlipSketch, a system that brings back the magic of flip-book animation -- just draw your idea and describe how you want it to move!
arXiv Detail & Related papers (2024-11-16T14:53:03Z) - Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image.
Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details.
We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z) - Regenerating Arbitrary Video Sequences with Distillation Path-Finding [6.687073794084539]
This paper presents an interactive framework to generate new sequences according to the users' preference on the starting frame.
To achieve this effectively, we first learn the feature correlation on the frameset of the given video through a proposed network called RSFNet.
Then, we develop a novel path-finding algorithm, SDPF, which formulates the knowledge of motion directions of the source video to estimate the smooth and plausible sequences.
arXiv Detail & Related papers (2023-11-13T09:05:30Z) - Self-Supervised Equivariant Scene Synthesis from Video [84.15595573718925]
We propose a framework to learn scene representations from video that are automatically delineated into background, characters, and animations.
After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components.
We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.
arXiv Detail & Related papers (2021-02-01T14:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.