Zero-shot High-fidelity and Pose-controllable Character Animation
- URL: http://arxiv.org/abs/2404.13680v3
- Date: Wed, 5 Jun 2024 07:32:32 GMT
- Title: Zero-shot High-fidelity and Pose-controllable Character Animation
- Authors: Bingwen Zhu, Fanyi Wang, Tianyi Lu, Peng Liu, Jingwen Su, Jinxiu Liu, Yanhao Zhang, Zuxuan Wu, Guo-Jun Qi, Yu-Gang Jiang,
- Abstract summary: Image-to-video (I2V) generation aims to create a video sequence from a single image.
Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details.
We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
- Score: 89.74818983864832
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-to-video (I2V) generation aims to create a video sequence from a single image, which requires high temporal coherence and visual fidelity. However, existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. Moreover, they require a large amount of video data for training, which can be computationally demanding. To address these limitations, we propose PoseAnimate, a novel zero-shot I2V framework for character animation. PoseAnimate contains three key components: 1) a Pose-Aware Control Module (PACM) that incorporates diverse pose signals into text embeddings, to preserve character-independent content and maintain precise alignment of actions. 2) a Dual Consistency Attention Module (DCAM) that enhances temporal consistency and retains character identity and intricate background details. 3) a Mask-Guided Decoupling Module (MGDM) that refines distinct feature perception abilities, improving animation fidelity by decoupling the character and background. We also propose a Pose Alignment Transition Algorithm (PATA) to ensure smooth action transition. Extensive experiment results demonstrate that our approach outperforms the state-of-the-art training-based methods in terms of character consistency and detail fidelity. Moreover, it maintains a high level of temporal coherence throughout the generated animations.
Related papers
- Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation [19.408715783816167]
We introduce a training-free framework that ensures the generated video sequence preserves the reference image's subtleties.
We decouple skeletal and motion priors from pose information, enabling precise control over animation generation.
Our method significantly enhances the quality of video generation without the need for large datasets or expensive computational resources.
arXiv Detail & Related papers (2024-08-29T13:08:12Z) - Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control [77.08568533331206]
Follow-Your-Pose v2 can be trained on noisy open-sourced videos readily available on the internet.
Our approach outperforms state-of-the-art methods by a margin of over 35% across 2 datasets and on 7 metrics.
arXiv Detail & Related papers (2024-06-05T08:03:18Z) - AnimateZoo: Zero-shot Video Generation of Cross-Species Animation via Subject Alignment [64.02822911038848]
We present AnimateZoo, a zero-shot diffusion-based video generator to produce animal animations.
Key technique used in our AnimateZoo is subject alignment, which includes two steps.
Our model is capable of generating videos characterized by accurate movements, consistent appearance, and high-fidelity frames.
arXiv Detail & Related papers (2024-04-07T12:57:41Z) - AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff.
For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation.
For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z) - Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities.
In this paper, we propose a novel framework tailored for character animation.
By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z) - MagicAnimate: Temporally Consistent Human Image Animation using
Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence.
Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion.
We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z) - Self-Supervised Equivariant Scene Synthesis from Video [84.15595573718925]
We propose a framework to learn scene representations from video that are automatically delineated into background, characters, and animations.
After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components.
We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.
arXiv Detail & Related papers (2021-02-01T14:17:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.