MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model
- URL: http://arxiv.org/abs/2311.16498v1
- Date: Mon, 27 Nov 2023 18:32:31 GMT
- Title: MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model
- Authors: Zhongcong Xu, Jianfeng Zhang, Jun Hao Liew, Hanshu Yan, Jia-Wei Liu,
Chenxu Zhang, Jiashi Feng, Mike Zheng Shou
- Abstract summary: This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
- Score: 74.84435399451573
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper studies the human image animation task, which aims to generate a
video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to
animate the reference image towards the target motion. Despite achieving
reasonable results, these approaches face challenges in maintaining temporal
consistency throughout the animation due to the lack of temporal modeling and
poor preservation of reference identity. In this work, we introduce
MagicAnimate, a diffusion-based framework that aims at enhancing temporal
consistency, preserving reference image faithfully, and improving animation
fidelity. To achieve this, we first develop a video diffusion model to encode
temporal information. Second, to maintain the appearance coherence across
frames, we introduce a novel appearance encoder to retain the intricate details
of the reference image. Leveraging these two innovations, we further employ a
simple video fusion technique to encourage smooth transitions for long video
animation. Empirical results demonstrate the superiority of our method over
baseline approaches on two benchmarks. Notably, our approach outperforms the
strongest baseline by over 38% in terms of video fidelity on the challenging
TikTok dancing dataset. Code and model will be made available.
Related papers
- Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation [19.408715783816167]
We introduce a training-free framework that ensures the generated video sequence preserves the reference image's subtleties.
We decouple skeletal and motion priors from pose information, enabling precise control over animation generation.
Our method significantly enhances the quality of video generation without the need for large datasets or expensive computational resources.
arXiv Detail & Related papers (2024-08-29T13:08:12Z) - UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation [53.16986875759286]
We present a UniAnimate framework to enable efficient and long-term human video generation.
We map the reference image along with the posture guidance and noise video into a common feature space.
We also propose a unified noise input that supports random noised input as well as first frame conditioned input.
arXiv Detail & Related papers (2024-06-03T10:51:10Z) - Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image.
Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details.
We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z) - AniClipart: Clipart Animation with Text-to-Video Priors [28.76809141136148]
We introduce AniClipart, a system that transforms static images into high-quality motion sequences guided by text-to-video priors.
Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models.
arXiv Detail & Related papers (2024-04-18T17:24:28Z) - LoopAnimate: Loopable Salient Object Animation [19.761865029125524]
LoopAnimate is a novel method for generating videos with consistent start and end frames.
It achieves state-of-the-art performance in both objective metrics, such as fidelity and temporal consistency, and subjective evaluation results.
arXiv Detail & Related papers (2024-04-14T07:36:18Z) - AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment [64.02822911038848]
We present AnimateZoo, a zero-shot diffusion-based video generator to produce animal animations.
Key technique used in our AnimateZoo is subject alignment, which includes two steps.
Our model is capable of generating videos characterized by accurate movements, consistent appearance, and high-fidelity frames.
arXiv Detail & Related papers (2024-04-07T12:57:41Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z) - First Order Motion Model for Image Animation [90.712718329677]
Image animation consists of generating a video sequence so that an object in a source image is animated according to the motion of a driving video.
Our framework addresses this problem without using any annotation or prior information about the specific object to animate.
arXiv Detail & Related papers (2020-02-29T07:08:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.