Related papers: MotionClone: Training-Free Motion Cloning for Controllable Video Generation

MotionClone: Training-Free Motion Cloning for Controllable Video Generation

URL: http://arxiv.org/abs/2406.05338v3
Date: Fri, 28 Jun 2024 18:08:19 GMT
Title: MotionClone: Training-Free Motion Cloning for Controllable Video Generation
Authors: Pengyang Ling, Jiazi Bu, Pan Zhang, Xiaoyi Dong, Yuhang Zang, Tong Wu, Huaian Chen, Jiaqi Wang, Yi Jin,
Abstract summary: MotionClone is a training-free framework that enables motion cloning from a reference video to control text-to-video generation. Experiments demonstrate that MotionClone exhibits proficiency in both global camera motion and local object motion.
Score: 41.621147782128396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Motion-based controllable text-to-video generation involves motions to control the video generation. Previous methods typically require the training of models to encode motion cues or the fine-tuning of video diffusion models. However, these approaches often result in suboptimal motion generation when applied outside the trained domain. In this work, we propose MotionClone, a training-free framework that enables motion cloning from a reference video to control text-to-video generation. We employ temporal attention in video inversion to represent the motions in the reference video and introduce primary temporal-attention guidance to mitigate the influence of noisy or very subtle motions within the attention weights. Furthermore, to assist the generation model in synthesizing reasonable spatial relationships and enhance its prompt-following capability, we propose a location-aware semantic guidance mechanism that leverages the coarse location of the foreground from the reference video and original classifier-free guidance features to guide the video generation. Extensive experiments demonstrate that MotionClone exhibits proficiency in both global camera motion and local object motion, with notable superiority in terms of motion fidelity, textual alignment, and temporal consistency.

Related papers

Follow-Your-Motion: Video Motion Transfer via Efficient Spatial-Temporal Decoupled Finetuning [50.4776422843776]
Follow-Your-Motion is an efficient two-stage video motion transfer framework that finetunes a powerful video diffusion transformer to synthesize complex motion.<n>We show extensive evaluations on MotionBench to verify the superiority of Follow-Your-Motion.
arXiv Detail & Related papers (2025-06-05T16:18:32Z)
ATI: Any Trajectory Instruction for Controllable Video Generation [25.249489701215467]
We propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion.<n>Our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models.
arXiv Detail & Related papers (2025-05-28T23:49:18Z)
MotionPro: A Precise Motion Controller for Image-to-Video Generation [108.63100943070592]
We present MotionPro, a precise motion controller for image-to-video (I2V) generation.<n>Region-wise trajectory and motion mask are used to regulate fine-grained motion synthesis.<n>Experiments conducted on WebVid-10M and MC-Bench demonstrate the effectiveness of MotionPro.
arXiv Detail & Related papers (2025-05-26T17:59:03Z)
Towards Synthesized and Editable Motion In-Betweening Through Part-Wise Phase Representation [20.697417033585577]
styled motion in-betweening is crucial for computer animation and gaming. We propose a novel framework that models motion styles at the body-part level. Our approach enables more nuanced and expressive animations.
arXiv Detail & Related papers (2025-03-11T08:44:27Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories. We translate high-level user requests into detailed, semi-dense motion prompts. We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context. multimodal representations from recaptioned prompt and video frames promote the modeling of appearance. Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z)
MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos. Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z)
Motion Inversion for Video Customization [31.607669029754874]
We present a novel approach for motion in generation, addressing the widespread gap in the exploration of motion representation within video models. We introduce Motion Embeddings, a set of explicit, temporally coherent embeddings derived from given video. Our contributions include a tailored motion embedding for customization tasks and a demonstration of the practical advantages and effectiveness of our method.
arXiv Detail & Related papers (2024-03-29T14:14:22Z)
Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms. SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics. Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z)
MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method. MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model. During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z)
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation [77.09621778348733]
Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. This paper presents MotionCtrl, a unified motion controller for video generation designed to effectively and independently control camera and object motion.
arXiv Detail & Related papers (2023-12-06T17:49:57Z)
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation [131.1446077627191]
Zero-shot Text-to-Video synthesis generates videos based on prompts without any videos. We propose a prompt-adaptive and disentangled motion control strategy coined as MotionZero. Our strategy could correctly control motion of different objects and support versatile applications including zero-shot video edit.
arXiv Detail & Related papers (2023-11-28T09:38:45Z)
Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture. Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation. Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.