AnaMoDiff: 2D Analogical Motion Diffusion via Disentangled Denoising
- URL: http://arxiv.org/abs/2402.03549v1
- Date: Mon, 5 Feb 2024 22:10:54 GMT
- Title: AnaMoDiff: 2D Analogical Motion Diffusion via Disentangled Denoising
- Authors: Maham Tanveer, Yizhi Wang, Ruiqi Wang, Nanxuan Zhao, Ali
Mahdavi-Amiri, Hao Zhang
- Abstract summary: AnaMoDiff is a novel diffusion-based method for 2D motion analogies.
Our goal is to accurately transfer motions from a 2D driving video onto a source character, with its identity, in terms of appearance and natural movement.
- Score: 25.839194626743126
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present AnaMoDiff, a novel diffusion-based method for 2D motion analogies
that is applied to raw, unannotated videos of articulated characters. Our goal
is to accurately transfer motions from a 2D driving video onto a source
character, with its identity, in terms of appearance and natural movement, well
preserved, even when there may be significant discrepancies between the source
and driving characters in their part proportions and movement speed and styles.
Our diffusion model transfers the input motion via a latent optical flow (LOF)
network operating in a noised latent space, which is spatially aware, efficient
to process compared to the original RGB videos, and artifact-resistant through
the diffusion denoising process even amid dense movements. To accomplish both
motion analogy and identity preservation, we train our denoising model in a
feature-disentangled manner, operating at two noise levels. While
identity-revealing features of the source are learned via conventional noise
injection, motion features are learned from LOF-warped videos by only injecting
noise with large values, with the stipulation that motion properties involving
pose and limbs are encoded by higher-level features. Experiments demonstrate
that our method achieves the best trade-off between motion analogy and identity
preservation.
Related papers
- REWIND: Real-Time Egocentric Whole-Body Motion Diffusion with Exemplar-Based Identity Conditioning [95.07708090428814]
We present REWIND, a one-step diffusion model for real-time, high-fidelity human motion estimation from egocentric image inputs.
We introduce cascaded body-hand denoising diffusion, which effectively models the correlation between egocentric body and hand motions.
We also propose a novel identity conditioning method based on a small set of pose exemplars of the target identity, which further enhances motion estimation quality.
arXiv Detail & Related papers (2025-04-07T11:44:11Z) - Edit-Your-Motion: Space-Time Diffusion Decoupling Learning for Video Motion Editing [46.56615725175025]
We introduce Edit-Your-Motion, a video motion editing method that tackles unseen challenges through one-shot fine-tuning.
To effectively decouple motion and appearance of source video, we design atemporal-two-stage learning strategy.
With Edit-Your-Motion, users can edit the motion of humans in the source video, creating more engaging and diverse content.
arXiv Detail & Related papers (2024-05-07T17:06:59Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - MotionMix: Weakly-Supervised Diffusion for Controllable Motion
Generation [19.999239668765885]
MotionMix is a weakly-supervised diffusion model that leverages both noisy and unannotated motion sequences.
Our framework consistently achieves state-of-the-art performances on text-to-motion, action-to-motion, and music-to-dance tasks.
arXiv Detail & Related papers (2024-01-20T04:58:06Z) - EulerMormer: Robust Eulerian Motion Magnification via Dynamic Filtering
within Transformer [30.470336098766765]
Video Motion Magnification (VMM) aims to break the resolution limit of human visual perception capability.
This paper proposes a novel dynamic filtering strategy to achieve static-dynamic field adaptive denoising.
We demonstrate extensive experiments that EulerMormer achieves more robust video motion magnification from the Eulerian perspective.
arXiv Detail & Related papers (2023-12-07T09:10:16Z) - DiffusionPhase: Motion Diffusion in Frequency Domain [69.811762407278]
We introduce a learning-based method for generating high-quality human motion sequences from text descriptions.
Existing techniques struggle with motion diversity and smooth transitions in generating arbitrary-length motion sequences.
We develop a network encoder that converts the motion space into a compact yet expressive parameterized phase space.
arXiv Detail & Related papers (2023-12-07T04:39:22Z) - Priority-Centric Human Motion Generation in Discrete Latent Space [59.401128190423535]
We introduce a Priority-Centric Motion Discrete Diffusion Model (M2DM) for text-to-motion generation.
M2DM incorporates a global self-attention mechanism and a regularization term to counteract code collapse.
We also present a motion discrete diffusion model that employs an innovative noise schedule, determined by the significance of each motion token.
arXiv Detail & Related papers (2023-08-28T10:40:16Z) - MoFusion: A Framework for Denoising-Diffusion-based Motion Synthesis [73.52948992990191]
MoFusion is a new denoising-diffusion-based framework for high-quality conditional human motion synthesis.
We present ways to introduce well-known kinematic losses for motion plausibility within the motion diffusion framework.
We demonstrate the effectiveness of MoFusion compared to the state of the art on established benchmarks in the literature.
arXiv Detail & Related papers (2022-12-08T18:59:48Z) - Executing your Commands via Motion Diffusion in Latent Space [51.64652463205012]
We propose a Motion Latent-based Diffusion model (MLD) to produce vivid motion sequences conforming to the given conditional inputs.
Our MLD achieves significant improvements over the state-of-the-art methods among extensive human motion generation tasks.
arXiv Detail & Related papers (2022-12-08T03:07:00Z) - Learning Comprehensive Motion Representation for Action Recognition [124.65403098534266]
2D CNN-based methods are efficient but may yield redundant features due to applying the same 2D convolution kernel to each frame.
Recent efforts attempt to capture motion information by establishing inter-frame connections while still suffering the limited temporal receptive field or high latency.
We propose a Channel-wise Motion Enhancement (CME) module to adaptively emphasize the channels related to dynamic information with a channel-wise gate vector.
We also propose a Spatial-wise Motion Enhancement (SME) module to focus on the regions with the critical target in motion, according to the point-to-point similarity between adjacent feature maps.
arXiv Detail & Related papers (2021-03-23T03:06:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.