Related papers: Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation

Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation

URL: http://arxiv.org/abs/2408.16506v1
Date: Thu, 29 Aug 2024 13:08:12 GMT
Title: Alignment is All You Need: A Training-free Augmentation Strategy for Pose-guided Video Generation
Authors: Xiaoyu Jin, Zunnan Xu, Mingwen Ou, Wenming Yang,
Abstract summary: We introduce a training-free framework that ensures the generated video sequence preserves the reference image's subtleties. We decouple skeletal and motion priors from pose information, enabling precise control over animation generation. Our method significantly enhances the quality of video generation without the need for large datasets or expensive computational resources.
Score: 19.408715783816167
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Character animation is a transformative field in computer graphics and vision, enabling dynamic and realistic video animations from static images. Despite advancements, maintaining appearance consistency in animations remains a challenge. Our approach addresses this by introducing a training-free framework that ensures the generated video sequence preserves the reference image's subtleties, such as physique and proportions, through a dual alignment strategy. We decouple skeletal and motion priors from pose information, enabling precise control over animation generation. Our method also improves pixel-level alignment for conditional control from the reference character, enhancing the temporal consistency and visual cohesion of animations. Our method significantly enhances the quality of video generation without the need for large datasets or expensive computational resources.

Related papers

MVAnimate: Enhancing Character Animation with Multi-View Optimization [55.4217617472079]
We introduce MVAnimate, a novel framework that synthesizes both 2D and 3D information of dynamic figures based on multi-view prior information.<n>Our approach leverages multi-view prior information to produce temporally consistent and spatially coherent animation outputs.
arXiv Detail & Related papers (2026-02-09T14:55:21Z)
DreamDance: Animating Character Art via Inpainting Stable Gaussian Worlds [64.53681498600065]
DreamDance is an animation framework capable of producing stable, consistent character and scene motion conditioned on precise camera trajectories.<n>We train a pose-aware video inpainting model that injects the dynamic character into the scene video while enhancing background quality.
arXiv Detail & Related papers (2025-05-30T15:54:34Z)
Follow-Your-Pose v2: Multiple-Condition Guided Character Image Animation for Stable Pose Control [77.08568533331206]
Follow-Your-Pose v2 can be trained on noisy open-sourced videos readily available on the internet. Our approach outperforms state-of-the-art methods by a margin of over 35% across 2 datasets and on 7 metrics.
arXiv Detail & Related papers (2024-06-05T08:03:18Z)
Zero-shot High-fidelity and Pose-controllable Character Animation [89.74818983864832]
Image-to-video (I2V) generation aims to create a video sequence from a single image. Existing approaches suffer from inconsistency of character appearances and poor preservation of fine details. We propose PoseAnimate, a novel zero-shot I2V framework for character animation.
arXiv Detail & Related papers (2024-04-21T14:43:31Z)
AniClipart: Clipart Animation with Text-to-Video Priors [28.76809141136148]
We introduce AniClipart, a system that transforms static images into high-quality motion sequences guided by text-to-video priors. Experimental results show that the proposed AniClipart consistently outperforms existing image-to-video generation models.
arXiv Detail & Related papers (2024-04-18T17:24:28Z)
AnimateZero: Video Diffusion Models are Zero-Shot Image Animators [63.938509879469024]
We propose AnimateZero to unveil the pre-trained text-to-video diffusion model, i.e., AnimateDiff. For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation. For temporal control, we replace the global temporal attention of the original T2V model with our proposed positional-corrected window attention.
arXiv Detail & Related papers (2023-12-06T13:39:35Z)
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation [27.700371215886683]
diffusion models have become the mainstream in visual generation research, owing to their robust generative capabilities. In this paper, we propose a novel framework tailored for character animation. By expanding the training data, our approach can animate arbitrary characters, yielding superior results in character animation compared to other image-to-video methods.
arXiv Detail & Related papers (2023-11-28T12:27:15Z)
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model [74.84435399451573]
This paper studies the human image animation task, which aims to generate a video of a certain reference identity following a particular motion sequence. Existing animation works typically employ the frame-warping technique to animate the reference image towards the target motion. We introduce MagicAnimate, a diffusion-based framework that aims at enhancing temporal consistency, preserving reference image faithfully, and improving animation fidelity.
arXiv Detail & Related papers (2023-11-27T18:32:31Z)
AnimateAnything: Fine-Grained Open Domain Image Animation with Motion Guidance [13.416296247896042]
We introduce an open domain image animation method that leverages the motion prior of video diffusion model. Our approach introduces targeted motion area guidance and motion strength guidance, enabling precise control of the movable area and its motion speed. We validate the effectiveness of our method through rigorous experiments on an open-domain dataset.
arXiv Detail & Related papers (2023-11-21T03:47:54Z)
DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors [63.43133768897087]
We propose a method to convert open-domain images into animated videos. The key idea is to utilize the motion prior to text-to-video diffusion models by incorporating the image into the generative process as guidance. Our proposed method can produce visually convincing and more logical & natural motions, as well as higher conformity to the input image.
arXiv Detail & Related papers (2023-10-18T14:42:16Z)
Unsupervised Coherent Video Cartoonization with Perceptual Motion Consistency [89.75731026852338]
We propose a spatially-adaptive alignment framework with perceptual motion consistency for coherent video cartoonization. We devise the semantic correlative map as a style-independent, global-aware regularization on the perceptual consistency motion. Our method is able to generate highly stylistic and temporal consistent cartoon videos.
arXiv Detail & Related papers (2022-04-02T07:59:02Z)
Self-Supervised Equivariant Scene Synthesis from Video [84.15595573718925]
We propose a framework to learn scene representations from video that are automatically delineated into background, characters, and animations. After training, we can manipulate image encodings in real time to create unseen combinations of the delineated components. We demonstrate results on three datasets: Moving MNIST with backgrounds, 2D video game sprites, and Fashion Modeling.
arXiv Detail & Related papers (2021-02-01T14:17:31Z)
Going beyond Free Viewpoint: Creating Animatable Volumetric Video of Human Performances [7.7824496657259665]
We present an end-to-end pipeline for the creation of high-quality animatable volumetric video content of human performances. Semantic enrichment and geometric animation ability are achieved by establishing temporal consistency in the 3D data. For pose editing, we exploit the captured data as much as possible and kinematically deform the captured frames to fit a desired pose.
arXiv Detail & Related papers (2020-09-02T09:46:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.