Deep Animation Video Interpolation in the Wild
- URL: http://arxiv.org/abs/2104.02495v1
- Date: Tue, 6 Apr 2021 13:26:49 GMT
- Title: Deep Animation Video Interpolation in the Wild
- Authors: Li Siyao, Shiyu Zhao, Weijiang Yu, Wenxiu Sun, Dimitris N. Metaxas,
Chen Change Loy, Ziwei Liu
- Abstract summary: In this work, we formally define and study the animation video code problem for the first time.
We propose an effective framework, AnimeInterp, with two dedicated modules in a coarse-to-fine manner.
Notably, AnimeInterp shows favorable perceptual quality and robustness for animation scenarios in the wild.
- Score: 115.24454577119432
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In the animation industry, cartoon videos are usually produced at low frame
rate since hand drawing of such frames is costly and time-consuming. Therefore,
it is desirable to develop computational models that can automatically
interpolate the in-between animation frames. However, existing video
interpolation methods fail to produce satisfying results on animation data.
Compared to natural videos, animation videos possess two unique characteristics
that make frame interpolation difficult: 1) cartoons comprise lines and smooth
color pieces. The smooth areas lack textures and make it difficult to estimate
accurate motions on animation videos. 2) cartoons express stories via
exaggeration. Some of the motions are non-linear and extremely large. In this
work, we formally define and study the animation video interpolation problem
for the first time. To address the aforementioned challenges, we propose an
effective framework, AnimeInterp, with two dedicated modules in a
coarse-to-fine manner. Specifically, 1) Segment-Guided Matching resolves the
"lack of textures" challenge by exploiting global matching among color pieces
that are piece-wise coherent. 2) Recurrent Flow Refinement resolves the
"non-linear and extremely large motion" challenge by recurrent predictions
using a transformer-like architecture. To facilitate comprehensive training and
evaluations, we build a large-scale animation triplet dataset, ATD-12K, which
comprises 12,000 triplets with rich annotations. Extensive experiments
demonstrate that our approach outperforms existing state-of-the-art
interpolation methods for animation videos. Notably, AnimeInterp shows
favorable perceptual quality and robustness for animation scenarios in the
wild. The proposed dataset and code are available at
https://github.com/lisiyao21/AnimeInterp/.
Related papers
- AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment [64.02822911038848]
We present AnimateZoo, a zero-shot diffusion-based video generator to produce animal animations.
Key technique used in our AnimateZoo is subject alignment, which includes two steps.
Our model is capable of generating videos characterized by accurate movements, consistent appearance, and high-fidelity frames.
arXiv Detail & Related papers (2024-04-07T12:57:41Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - AnimeRun: 2D Animation Visual Correspondence from Open Source 3D Movies [98.65469430034246]
Existing datasets for two-dimensional (2D) cartoon suffer from simple frame composition and monotonic movements.
We present a new 2D animation visual correspondence dataset, AnimeRun, by converting open source 3D movies to full scenes in 2D style.
Our analyses show that the proposed dataset not only resembles real anime more in image composition, but also possesses richer and more complex motion patterns compared to existing datasets.
arXiv Detail & Related papers (2022-11-10T17:26:21Z) - SketchBetween: Video-to-Video Synthesis for Sprite Animation via
Sketches [0.9645196221785693]
2D animation is a common factor in game development, used for characters, effects and background art.
Automated animation approaches exist, but are designed without animators in mind.
We propose a problem formulation that adheres more closely to the standard workflow of animation.
arXiv Detail & Related papers (2022-09-01T02:43:19Z) - Enhanced Deep Animation Video Interpolation [47.7046169124373]
Existing learning-based frame algorithms extract consecutive frames from high-speed natural videos to train the model.
Compared to natural videos, cartoon videos are usually in a low frame rate.
We present AutoFI, a method to automatically render training data for deep animation video.
arXiv Detail & Related papers (2022-06-25T14:00:48Z) - Render In-between: Motion Guided Video Synthesis for Action
Interpolation [53.43607872972194]
We propose a motion-guided frame-upsampling framework that is capable of producing realistic human motion and appearance.
A novel motion model is trained to inference the non-linear skeletal motion between frames by leveraging a large-scale motion-capture dataset.
Our pipeline only requires low-frame-rate videos and unpaired human motion data but does not require high-frame-rate videos for training.
arXiv Detail & Related papers (2021-11-01T15:32:51Z) - Going beyond Free Viewpoint: Creating Animatable Volumetric Video of
Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances.
Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data.
For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.