Related papers: ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer

ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer

URL: http://arxiv.org/abs/2504.02451v1
Date: Thu, 03 Apr 2025 10:15:52 GMT
Title: ConMo: Controllable Motion Disentanglement and Recomposition for Zero-Shot Motion Transfer
Authors: Jiayi Gao, Zijin Yin, Changcheng Hua, Yuxin Peng, Kongming Liang, Zhanyu Ma, Jun Guo, Yang Liu,
Abstract summary: ConMo is a framework that disentangles and recomposes the motions of subjects and camera movements.<n>It enables more accurate motion control across diverse subjects and improves performance in multi-subject scenarios.<n>ConMo unlocks a wide range of applications, including subject size and position editing, subject removal, semantic modifications, and camera motion simulation.
Score: 44.33224798292861
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The development of Text-to-Video (T2V) generation has made motion transfer possible, enabling the control of video motion based on existing footage. However, current methods have two limitations: 1) struggle to handle multi-subjects videos, failing to transfer specific subject motion; 2) struggle to preserve the diversity and accuracy of motion as transferring to subjects with varying shapes. To overcome these, we introduce \textbf{ConMo}, a zero-shot framework that disentangle and recompose the motions of subjects and camera movements. ConMo isolates individual subject and background motion cues from complex trajectories in source videos using only subject masks, and reassembles them for target video generation. This approach enables more accurate motion control across diverse subjects and improves performance in multi-subject scenarios. Additionally, we propose soft guidance in the recomposition stage which controls the retention of original motion to adjust shape constraints, aiding subject shape adaptation and semantic transformation. Unlike previous methods, ConMo unlocks a wide range of applications, including subject size and position editing, subject removal, semantic modifications, and camera motion simulation. Extensive experiments demonstrate that ConMo significantly outperforms state-of-the-art methods in motion fidelity and semantic consistency. The code is available at https://github.com/Andyplus1/ConMo.

Related papers

Generative Video Motion Editing with 3D Point Tracks [66.55707897151909]
We present a track-conditioned V2V framework that enables joint editing of camera and object motion.<n>We achieve this by conditioning a model on a source video and paired 3D point tracks representing source and target motions.<n>Our model supports diverse motion edits, including joint camera/object manipulation, motion transfer, and non-rigid deformation.
arXiv Detail & Related papers (2025-12-01T18:59:55Z)
CoMo: Compositional Motion Customization for Text-to-Video Generation [40.446146411270156]
CoMo is a novel framework for $textbfcompositional motion customization$ in text-to-video generation.<n>It addresses the challenges of motion-merge entanglement and ineffective multi-motion blending.<n>CoMo achieves state-of-the-art performance, significantly advancing the capabilities of controllable video generation.
arXiv Detail & Related papers (2025-10-27T04:57:09Z)
MotionDiff: Training-free Zero-shot Interactive Motion Editing via Flow-assisted Multi-view Diffusion [20.142107033583027]
MotionDiff is a training-free zero-shot diffusion method that leverages optical flow for complex multi-view motion editing.<n>It outperforms other physics-based generative motion editing methods in achieving high-quality multi-view consistent motion results.<n>MotionDiff does not require retraining, enabling users to conveniently adapt it for various down-stream tasks.
arXiv Detail & Related papers (2025-03-22T08:32:56Z)
Motion Prompting: Controlling Video Generation with Motion Trajectories [57.049252242807874]
We train a video generation model conditioned on sparse or dense video trajectories.<n>We translate high-level user requests into detailed, semi-dense motion prompts.<n>We demonstrate our approach through various applications, including camera and object motion control, "interacting" with an image, motion transfer, and image editing.
arXiv Detail & Related papers (2024-12-03T18:59:56Z)
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context.<n> multimodal representations from recaptioned prompt and video frames promote the modeling of appearance.<n>Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z)
MotionBooth: Motion-Aware Customized Text-to-Video Generation [44.41894050494623]
MotionBooth is a framework designed for animating customized subjects with precise control over both object and camera movements. We efficiently fine-tune a text-to-video model to capture the object's shape and attributes accurately. Our approach presents subject region loss and video preservation loss to enhance the subject's learning performance.
arXiv Detail & Related papers (2024-06-25T17:42:25Z)
MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing. It delivers superior motion editing performance and exclusively supports large camera movements and actions. Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z)
MotionMaster: Training-free Camera Motion Transfer For Video Generation [48.706578330771386]
We propose a novel training-free video motion transfer model, which disentangles camera motions and object motions in source videos. Our model can effectively decouple camera-object motion and apply the decoupled camera motion to a wide range of controllable video generation tasks.
arXiv Detail & Related papers (2024-04-24T10:28:54Z)
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing [57.882299081820626]
We introduce CoMo, a Controllable Motion generation model, adept at accurately generating and editing motions. CoMo decomposes motions into discrete and semantically meaningful pose codes. It autoregressively generates sequences of pose codes, which are then decoded into 3D motions.
arXiv Detail & Related papers (2024-03-20T18:11:10Z)
Direct-a-Video: Customized Video Generation with User-Directed Camera Movement and Object Motion [34.404342332033636]
We introduce Direct-a-Video, a system that allows users to independently specify motions for multiple objects as well as camera's pan and zoom movements. For camera movement, we introduce new temporal cross-attention layers to interpret quantitative camera movement parameters. Both components operate independently, allowing individual or combined control, and can generalize to open-domain scenarios.
arXiv Detail & Related papers (2024-02-05T16:30:57Z)
MotionCrafter: One-Shot Motion Customization of Diffusion Models [66.44642854791807]
We introduce MotionCrafter, a one-shot instance-guided motion customization method. MotionCrafter employs a parallel spatial-temporal architecture that injects the reference motion into the temporal component of the base model. During training, a frozen base model provides appearance normalization, effectively separating appearance from motion.
arXiv Detail & Related papers (2023-12-08T16:31:04Z)
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation [77.09621778348733]
Motions in a video primarily consist of camera motion, induced by camera movement, and object motion, resulting from object movement. This paper presents MotionCtrl, a unified motion controller for video generation designed to effectively and independently control camera and object motion.
arXiv Detail & Related papers (2023-12-06T17:49:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.