MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models
- URL: http://arxiv.org/abs/2412.05275v1
- Date: Fri, 06 Dec 2024 18:59:12 GMT
- Title: MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models
- Authors: Tuna Han Salih Meral, Hidir Yesiltepe, Connor Dunlop, Pinar Yanardag,
- Abstract summary: We introduce MotionFlow, a novel framework designed for motion transfer in video diffusion models.<n>Our method utilizes cross-attention maps to accurately capture and manipulate spatial and temporal dynamics.<n>Our experiments demonstrate that MotionFlow significantly outperforms existing models in both fidelity and versatility even during drastic scene alterations.
- Score: 3.2311303453753033
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-video models have demonstrated impressive capabilities in producing diverse and captivating video content, showcasing a notable advancement in generative AI. However, these models generally lack fine-grained control over motion patterns, limiting their practical applicability. We introduce MotionFlow, a novel framework designed for motion transfer in video diffusion models. Our method utilizes cross-attention maps to accurately capture and manipulate spatial and temporal dynamics, enabling seamless motion transfers across various contexts. Our approach does not require training and works on test-time by leveraging the inherent capabilities of pre-trained video diffusion models. In contrast to traditional approaches, which struggle with comprehensive scene changes while maintaining consistent motion, MotionFlow successfully handles such complex transformations through its attention-based mechanism. Our qualitative and quantitative experiments demonstrate that MotionFlow significantly outperforms existing models in both fidelity and versatility even during drastic scene alterations.
Related papers
- EfficientMT: Efficient Temporal Adaptation for Motion Transfer in Text-to-Video Diffusion Models [73.96414072072048]
Existing motion transfer methods explored the motion representations of reference videos to guide generation.
We propose EfficientMT, a novel and efficient end-to-end framework for video motion transfer.
Our experiments demonstrate that our EfficientMT outperforms existing methods in efficiency while maintaining flexible motion controllability.
arXiv Detail & Related papers (2025-03-25T05:51:14Z) - MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion [20.142107033583027]
MotionDiff is a training-free zero-shot diffusion method that leverages optical flow for complex multi-view motion editing.
It outperforms other physics-based generative motion editing methods in achieving high-quality multi-view consistent motion results.
MotionDiff does not require retraining, enabling users to conveniently adapt it for various down-stream tasks.
arXiv Detail & Related papers (2025-03-22T08:32:56Z) - MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching [27.28898943916193]
Text-to-video (T2V) diffusion models have promising capabilities in synthesizing realistic videos from input text prompts.
In this work, we tackle the motion customization problem, where a reference video is provided as motion guidance.
We propose MotionMatcher, a motion customization framework that fine-tunes the pre-trained T2V diffusion model at the feature level.
arXiv Detail & Related papers (2025-02-18T19:12:51Z) - Mojito: Motion Trajectory and Intensity Control for Video Generation [79.85687620761186]
This paper introduces Mojito, a diffusion model that incorporates both motion trajectory and intensity control for text-to-video generation.
Experiments demonstrate Mojito's effectiveness in achieving precise trajectory and intensity control with high computational efficiency.
arXiv Detail & Related papers (2024-12-12T05:26:43Z) - Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.<n>We translate high-level user requests into detailed, semi-dense motion prompts.<n>We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z) - MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context.<n> multimodal representations from recaptioned prompt and video frames promote the modeling of appearance.<n>Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z) - Video Diffusion Models are Training-free Motion Interpreter and Controller [20.361790608772157]
This paper introduces a novel perspective to understand, localize, and manipulate motion-aware features in video diffusion models.
We present a new MOtion FeaTure (MOFT) by eliminating content correlation information and filtering motion channels.
arXiv Detail & Related papers (2024-05-23T17:59:40Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Animate Your Motion: Turning Still Images into Dynamic Videos [58.63109848837741]
We introduce Scene and Motion Conditional Diffusion (SMCD), a novel methodology for managing multimodal inputs.
SMCD incorporates a recognized motion conditioning module and investigates various approaches to integrate scene conditions.
Our design significantly enhances video quality, motion precision, and semantic coherence.
arXiv Detail & Related papers (2024-03-15T10:36:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.