Video Motion Transfer with Diffusion Transformers
- URL: http://arxiv.org/abs/2412.07776v1
- Date: Tue, 10 Dec 2024 18:59:58 GMT
- Title: Video Motion Transfer with Diffusion Transformers
- Authors: Alexander Pondaven, Aliaksandr Siarohin, Sergey Tulyakov, Philip Torr, Fabio Pizzati,
- Abstract summary: We propose DiTFlow, a method for transferring the motion of a reference video to a newly synthesized one.
We first process the reference video with a pre-trained DiT to analyze cross-frame attention maps and extract a patch-wise motion signal.
We apply our strategy to transformer positional embeddings, granting us a boost in zero-shot motion transfer capabilities.
- Score: 82.4796313201512
- License:
- Abstract: We propose DiTFlow, a method for transferring the motion of a reference video to a newly synthesized one, designed specifically for Diffusion Transformers (DiT). We first process the reference video with a pre-trained DiT to analyze cross-frame attention maps and extract a patch-wise motion signal called the Attention Motion Flow (AMF). We guide the latent denoising process in an optimization-based, training-free, manner by optimizing latents with our AMF loss to generate videos reproducing the motion of the reference one. We also apply our optimization strategy to transformer positional embeddings, granting us a boost in zero-shot motion transfer capabilities. We evaluate DiTFlow against recently published methods, outperforming all across multiple metrics and human evaluation.
Related papers
- Improving Video Generation with Human Feedback [81.48120703718774]
Video generation has achieved significant advances, but issues like unsmooth motion and misalignment between videos and prompts persist.
We develop a systematic pipeline that harnesses human feedback to mitigate these problems and refine the video generation model.
We introduce VideoReward, a multi-dimensional video reward model, and examine how annotations and various design choices impact its rewarding efficacy.
arXiv Detail & Related papers (2025-01-23T18:55:41Z) - MotionFlow: Attention-Driven Motion Transfer in Video Diffusion Models [3.2311303453753033]
We introduce MotionFlow, a novel framework designed for motion transfer in video diffusion models.
Our method utilizes cross-attention maps to accurately capture and manipulate spatial and temporal dynamics.
Our experiments demonstrate that MotionFlow significantly outperforms existing models in both fidelity and versatility even during drastic scene alterations.
arXiv Detail & Related papers (2024-12-06T18:59:12Z) - Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI)
We introduce General Implicit Motion Modeling (IMM), a novel and effective approach to motion modeling VFI.
Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion.
arXiv Detail & Related papers (2024-07-11T17:13:15Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Motion Flow Matching for Human Motion Synthesis and Editing [75.13665467944314]
We propose emphMotion Flow Matching, a novel generative model for human motion generation featuring efficient sampling and effectiveness in motion editing applications.
Our method reduces the sampling complexity from thousand steps in previous diffusion models to just ten steps, while achieving comparable performance in text-to-motion and action-to-motion generation benchmarks.
arXiv Detail & Related papers (2023-12-14T12:57:35Z) - MVFlow: Deep Optical Flow Estimation of Compressed Videos with Motion
Vector Prior [16.633665275166706]
We propose an optical flow model, MVFlow, which uses motion vectors to improve the speed and accuracy of optical flow estimation for compressed videos.
The experimental results demonstrate the superiority of our proposed MVFlow by 1.09 compared to existing models or save time to achieve similar accuracy to existing models.
arXiv Detail & Related papers (2023-08-03T07:16:18Z) - Boost Video Frame Interpolation via Motion Adaptation [73.42573856943923]
Video frame (VFI) is a challenging task that aims to generate intermediate frames between two consecutive frames in a video.
Existing learning-based VFI methods have achieved great success, but they still suffer from limited generalization ability.
We propose a novel optimization-based VFI method that can adapt to unseen motions at test time.
arXiv Detail & Related papers (2023-06-24T10:44:02Z) - TENET: Transformer Encoding Network for Effective Temporal Flow on
Motion Prediction [11.698627151060467]
We develop a Transformer-based method for input encoding and trajectory prediction.
We win the first place of Argoverse 2 Motion Forecasting Challenge with the state-of-the-art brier-minFDE score of 1.90.
arXiv Detail & Related papers (2022-06-30T08:39:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.