Spectral Motion Alignment for Video Motion Transfer using Diffusion Models
- URL: http://arxiv.org/abs/2403.15249v1
- Date: Fri, 22 Mar 2024 14:47:18 GMT
- Title: Spectral Motion Alignment for Video Motion Transfer using Diffusion Models
- Authors: Geon Yeong Park, Hyeonho Jeong, Sang Wan Lee, Jong Chul Ye,
- Abstract summary: Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
- Score: 54.32923808964701
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The evolution of diffusion models has greatly impacted video generation and understanding. Particularly, text-to-video diffusion models (VDMs) have significantly facilitated the customization of input video with target appearance, motion, etc. Despite these advances, challenges persist in accurately distilling motion information from video frames. While existing works leverage the consecutive frame residual as the target motion vector, they inherently lack global motion context and are vulnerable to frame-wise distortions. To address this, we present Spectral Motion Alignment (SMA), a novel framework that refines and aligns motion vectors using Fourier and wavelet transforms. SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics, and mitigating spatial artifacts. Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
Related papers
- EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models [73.96414072072048]
Existing motion transfer methods explored the motion representations of reference videos to guide generation.
We propose EfficientMT, a novel and efficient end-to-end framework for video motion transfer.
Our experiments demonstrate that our EfficientMT outperforms existing methods in efficiency while maintaining flexible motion controllability.
arXiv Detail & Related papers (2025-03-25T05:51:14Z) - MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion [20.142107033583027]
MotionDiff is a training-free zero-shot diffusion method that leverages optical flow for complex multi-view motion editing.
It outperforms other physics-based generative motion editing methods in achieving high-quality multi-view consistent motion results.
MotionDiff does not require retraining, enabling users to conveniently adapt it for various down-stream tasks.
arXiv Detail & Related papers (2025-03-22T08:32:56Z) - VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators.
VideoJAM achieves state-of-the-art performance in motion coherence.
These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z) - Motion-Aware Generative Frame Interpolation [23.380470636851022]
Flow-based frame methods ensure motion stability through estimated intermediate flow but often introduce severe artifacts in complex motion regions.
Recent generative approaches, boosted by large-scale pre-trained video generation models, show promise in handling intricate scenes.
We propose Motion-aware Generative frame (MoG) that synergizes intermediate flow guidance with generative capacities to enhance fidelity.
arXiv Detail & Related papers (2025-01-07T11:03:43Z) - Mojito: Motion Trajectory and Intensity Control for Video Generation [79.85687620761186]
This paper introduces Mojito, a diffusion model that incorporates both motion trajectory and intensity control for text-to-video generation.
Experiments demonstrate Mojito's effectiveness in achieving precise trajectory and intensity control with high computational efficiency.
arXiv Detail & Related papers (2024-12-12T05:26:43Z) - MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models [3.2311303453753033]
We introduce MotionFlow, a novel framework designed for motion transfer in video diffusion models.
Our method utilizes cross-attention maps to accurately capture and manipulate spatial and temporal dynamics.
Our experiments demonstrate that MotionFlow significantly outperforms existing models in both fidelity and versatility even during drastic scene alterations.
arXiv Detail & Related papers (2024-12-06T18:59:12Z) - MotionClone: Training-Free Motion Cloning for Controllable Video Generation [41.621147782128396]
MotionClone is a training-free framework that enables motion cloning from reference videos to versatile motion-controlled video generation.
MotionClone exhibits proficiency in both global camera motion and local object motion, with notable superiority in terms of motion fidelity, textual alignment, and temporal consistency.
arXiv Detail & Related papers (2024-06-08T03:44:25Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z) - TrackDiffusion: Tracklet-Conditioned Video Generation via Diffusion Models [75.20168902300166]
We propose TrackDiffusion, a novel video generation framework affording fine-grained trajectory-conditioned motion control.
A pivotal component of TrackDiffusion is the instance enhancer, which explicitly ensures inter-frame consistency of multiple objects.
generated video sequences by our TrackDiffusion can be used as training data for visual perception models.
arXiv Detail & Related papers (2023-12-01T15:24:38Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning [16.094271750354835]
Motion information is critical to a robust and generalized video representation.
Recent works have adopted frame difference as the source of motion information in video contrastive learning.
We present a framework capable of introducing well-aligned and significant motion information.
arXiv Detail & Related papers (2023-09-01T07:03:27Z) - LaMD: Latent Motion Diffusion for Video Generation [69.4111397077229]
latent motion diffusion (LaMD) framework consists of a motion-decomposed video autoencoder and a diffusion-based motion generator.
Results show that LaMD generates high-quality videos with a wide range of motions, from dynamics to highly controllable movements.
arXiv Detail & Related papers (2023-04-23T10:32:32Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z) - Self-supervised Motion Learning from Static Images [36.85209332144106]
Motion from Static Images (MoSI) learns to encode motion information.
MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
We demonstrate that MoSI can discover regions with large motion even without fine-tuning on the downstream datasets.
arXiv Detail & Related papers (2021-04-01T03:55:50Z) - MotionRNN: A Flexible Model for Video Prediction with Spacetime-Varying
Motions [70.30211294212603]
This paper tackles video prediction from a new dimension of predicting spacetime-varying motions that are incessantly across both space and time.
We propose the MotionRNN framework, which can capture the complex variations within motions and adapt to spacetime-varying scenarios.
arXiv Detail & Related papers (2021-03-03T08:11:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.