AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era
- URL: http://arxiv.org/abs/2412.10255v3
- Date: Thu, 19 Dec 2024 04:31:44 GMT
- Title: AniSora: Exploring the Frontiers of Animation Video Generation in the Sora Era
- Authors: Yudong Jiang, Baohan Xu, Siqian Yang, Mingyu Yin, Jing Liu, Chao Xu, Siqi Wang, Yidi Wu, Bingwen Zhu, Xinwen Zhang, Xingyu Zheng, Jixuan Xu, Yue Zhang, Jinlong Hou, Huyang Sun,
- Abstract summary: We present a comprehensive system, AniSora, designed for animation video generation.
supported by the data processing pipeline with over 10M high-quality data.
We also collect an evaluation benchmark of various animation videos.
- Score: 20.670217061810614
- License:
- Abstract: Animation has gained significant interest in the recent film and TV industry. Despite the success of advanced video generation models like Sora, Kling, and CogVideoX in generating natural videos, they lack the same effectiveness in handling animation videos. Evaluating animation video generation is also a great challenge due to its unique artist styles, violating the laws of physics and exaggerated motions. In this paper, we present a comprehensive system, AniSora, designed for animation video generation, which includes a data processing pipeline, a controllable generation model, and an evaluation dataset. Supported by the data processing pipeline with over 10M high-quality data, the generation model incorporates a spatiotemporal mask module to facilitate key animation production functions such as image-to-video generation, frame interpolation, and localized image-guided animation. We also collect an evaluation benchmark of 948 various animation videos, the evaluation on VBench and human double-blind test demonstrates consistency in character and motion, achieving state-of-the-art results in animation video generation. Our evaluation benchmark will be publicly available at https://github.com/bilibili/Index-anisora.
Related papers
- HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation [64.37874983401221]
We present HumanVid, the first large-scale high-quality dataset tailored for human image animation.
For the real-world data, we compile a vast collection of real-world videos from the internet.
For the synthetic data, we collected 10K 3D avatar assets and leveraged existing assets of body shapes, skin textures and clothings.
arXiv Detail & Related papers (2024-07-24T17:15:58Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - AnimateAnything: Fine-Grained Open Domain Image Animation with Motion
Guidance [13.416296247896042]
We introduce an open domain image animation method that leverages the motion prior of video diffusion model.
Our approach introduces targeted motion area guidance and motion strength guidance, enabling precise control of the movable area and its motion speed.
We validate the effectiveness of our method through rigorous experiments on an open-domain dataset.
arXiv Detail & Related papers (2023-11-21T03:47:54Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - Learning Fine-Grained Motion Embedding for Landscape Animation [140.57889994591494]
We propose a model named FGLA to generate high-quality and realistic videos by learning Fine-Grained motion embedding.
To train and evaluate on diverse time-lapse videos, we build the largest high-resolution Time-lapse video dataset with Diverse scenes.
Our method achieves relative improvements by 19% on LIPIS and 5.6% on FVD compared with state-of-the-art methods on our dataset.
arXiv Detail & Related papers (2021-09-06T02:47:11Z) - Deep Animation Video Interpolation in the Wild [115.24454577119432]
In this work, we formally define and study the animation video code problem for the first time.
We propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner.
Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild.
arXiv Detail & Related papers (2021-04-06T13:26:49Z) - Going beyond Free Viewpoint: Creating Animatable Volumetric Video of
Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances.
Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data.
For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.