Differential Motion Evolution for Fine-Grained Motion Deformation in
Unsupervised Image Animation
- URL: http://arxiv.org/abs/2110.04658v2
- Date: Sun, 19 Nov 2023 21:56:31 GMT
- Title: Differential Motion Evolution for Fine-Grained Motion Deformation in
Unsupervised Image Animation
- Authors: Peirong Liu, Rui Wang, Xuefei Cao, Yipin Zhou, Ashish Shah, Ser-Nam
Lim
- Abstract summary: We introduce DiME, an end-to-end unsupervised motion transfer framework.
By capturing the motion transfer with an ordinary differential equation (ODE), it helps to regularize the motion field.
We also propose a natural extension to the ODE idea, which is that DiME can easily leverage multiple different views of the source object whenever they are available.
- Score: 41.85199775016731
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Image animation is the task of transferring the motion of a driving video to
a given object in a source image. While great progress has recently been made
in unsupervised motion transfer, requiring no labeled data or domain priors,
many current unsupervised approaches still struggle to capture the motion
deformations when large motion/view discrepancies occur between the source and
driving domains. Under such conditions, there is simply not enough information
to capture the motion field properly. We introduce DiME (Differential Motion
Evolution), an end-to-end unsupervised motion transfer framework integrating
differential refinement for motion estimation. Key findings are twofold: (1) by
capturing the motion transfer with an ordinary differential equation (ODE), it
helps to regularize the motion field, and (2) by utilizing the source image
itself, we are able to inpaint occluded/missing regions arising from large
motion changes. Additionally, we also propose a natural extension to the ODE
idea, which is that DiME can easily leverage multiple different views of the
source object whenever they are available by modeling an ODE per view.
Extensive experiments across 9 benchmarks show DiME outperforms the
state-of-the-arts by a significant margin and generalizes much better to unseen
objects.
Related papers
- Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning [16.094271750354835]
Motion information is critical to a robust and generalized video representation.
Recent works have adopted frame difference as the source of motion information in video contrastive learning.
We present a framework capable of introducing well-aligned and significant motion information.
arXiv Detail & Related papers (2023-09-01T07:03:27Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - Human MotionFormer: Transferring Human Motions with Vision Transformers [73.48118882676276]
Human motion transfer aims to transfer motions from a target dynamic person to a source static one for motion synthesis.
We propose Human MotionFormer, a hierarchical ViT framework that leverages global and local perceptions to capture large and subtle motion matching.
Experiments show that our Human MotionFormer sets the new state-of-the-art performance both qualitatively and quantitatively.
arXiv Detail & Related papers (2023-02-22T11:42:44Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z) - Motion and Appearance Adaptation for Cross-Domain Motion Transfer [36.98500700394921]
Motion transfer aims to transfer the motion of a driving video to a source image.
Traditional single domain motion transfer approaches often produce notable artifacts.
We propose a Motion and Appearance Adaptation (MAA) approach for cross-domain motion transfer.
arXiv Detail & Related papers (2022-09-29T03:24:47Z) - Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance [83.25826307000717]
We study the challenging problem of recovering detailed motion from a single motion-red image.
Existing solutions to this problem estimate a single image sequence without considering the motion ambiguity for each region.
In this paper, we explicitly account for such motion ambiguity, allowing us to generate multiple plausible solutions all in sharp detail.
arXiv Detail & Related papers (2022-07-20T18:05:53Z) - Self-supervised Motion Learning from Static Images [36.85209332144106]
Motion from Static Images (MoSI) learns to encode motion information.
MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
arXiv Detail & Related papers (2021-04-01T03:55:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.