Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance
- URL: http://arxiv.org/abs/2511.19909v1
- Date: Tue, 25 Nov 2025 04:34:42 GMT
- Title: Motion Marionette: Rethinking Rigid Motion Transfer via Prior Guidance
- Authors: Haoxuan Wang, Jiachen Tao, Junyi Wu, Gaowen Liu, Ramana Rao Kompella, Yan Yan,
- Abstract summary: Motion Marionette is a framework for rigid motion transfer from monocular source videos to single-view target images.<n> Motion trajectories are extracted from the source video to construct a spatial-temporal (SpaT) prior.<n>The resulting velocity field can be flexibly employed for efficient video production.
- Score: 26.642143303176997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Motion Marionette, a zero-shot framework for rigid motion transfer from monocular source videos to single-view target images. Previous works typically employ geometric, generative, or simulation priors to guide the transfer process, but these external priors introduce auxiliary constraints that lead to trade-offs between generalizability and temporal consistency. To address these limitations, we propose guiding the motion transfer process through an internal prior that exclusively captures the spatial-temporal transformations and is shared between the source video and any transferred target video. Specifically, we first lift both the source video and the target image into a unified 3D representation space. Motion trajectories are then extracted from the source video to construct a spatial-temporal (SpaT) prior that is independent of object geometry and semantics, encoding relative spatial variations over time. This prior is further integrated with the target object to synthesize a controllable velocity field, which is subsequently refined using Position-Based Dynamics to mitigate artifacts and enhance visual coherence. The resulting velocity field can be flexibly employed for efficient video production. Empirical results demonstrate that Motion Marionette generalizes across diverse objects, produces temporally consistent videos that align well with the source motion, and supports controllable video generation.
Related papers
- MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization [73.07309070257162]
MotionAdapter is a content-aware motion transfer framework that enables robust and semantically aligned motion transfer.<n>Our key insight is that effective motion transfer requires explicit disentanglement of motion from appearance.<n> MotionAdapter naturally supports complex motion transfer and motion editing tasks such as zooming.
arXiv Detail & Related papers (2026-01-05T10:01:27Z) - Characterizing Motion Encoding in Video Diffusion Timesteps [50.13907856401258]
We study how motion is encoded in video diffusion timesteps by the trade-off between appearance editing and motion preservation.<n>We identify an early, motion-dominant regime and a later, appearance-dominant regime, yielding an operational motion-appearance boundary in timestep space.
arXiv Detail & Related papers (2025-12-18T21:20:54Z) - Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance [107.25252623824296]
Wan-Move is a framework that brings motion control to video generative models.<n>Our core idea is to make the original condition features motion-aware for guiding video.<n>Through scaled training, Wan-Move generates 5-second, 480p videos whose motion controllability rivals Kling 1.5's commercial Motion Brush.
arXiv Detail & Related papers (2025-12-09T16:13:55Z) - Gaussian See, Gaussian Do: Semantic 3D Motion Transfer from Multiview Video [15.994811723477973]
We present a novel approach for semantic 3D motion transfer from multiview video.<n>We extract motion embeddings from source videos via condition inversion, apply them to rendered frames, and use resulting videos to supervise dynamic 3D Gaussian Splatting reconstruction.<n>We establish the first benchmark for semantic 3D motion transfer and demonstrate superior motion fidelity and structural consistency compared to adapted baselines.
arXiv Detail & Related papers (2025-11-18T19:02:50Z) - ATI: Any Trajectory Instruction for Controllable Video Generation [25.249489701215467]
We propose a unified framework for motion control in video generation that seamlessly integrates camera movement, object-level translation, and fine-grained local motion.<n>Our approach offers a cohesive solution by projecting user-defined trajectories into the latent space of pre-trained image-to-video generation models.
arXiv Detail & Related papers (2025-05-28T23:49:18Z) - MotionAgent: Fine-grained Controllable Video Generation via Motion Field Agent [55.15697390165972]
We propose MotionAgent, enabling fine-grained motion control for text-guided image-to-video generation.<n>The key technique is the motion field agent that converts motion information in text prompts into explicit motion fields.<n>We construct a subset of VBench to evaluate the alignment of motion information in the text and the generated video, outperforming other advanced models on motion generation accuracy.
arXiv Detail & Related papers (2025-02-05T14:26:07Z) - MotionClone: Training-Free Motion Cloning for Controllable Video Generation [41.621147782128396]
MotionClone is a training-free framework that enables motion cloning from reference videos to versatile motion-controlled video generation.
MotionClone exhibits proficiency in both global camera motion and local object motion, with notable superiority in terms of motion fidelity, textual alignment, and temporal consistency.
arXiv Detail & Related papers (2024-06-08T03:44:25Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.<n> SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.<n>Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Space-Time Diffusion Features for Zero-Shot Text-Driven Motion Transfer [27.278989809466392]
We present a new method for text-driven motion transfer - synthesizing a video that complies with an input text prompt describing the target objects and scene.
We leverage a pre-trained and fixed text-to-video diffusion model, which provides us with generative and motion priors.
arXiv Detail & Related papers (2023-11-28T18:03:27Z) - LEO: Generative Latent Image Animator for Human Video Synthesis [38.99490968487773]
We propose a novel framework for human video synthesis, placing emphasis on synthesizing-temporal coherency.
Our key idea is to represent motion as a sequence of flow maps in the generation process, which inherently isolate motion from appearance.
We implement this idea via a flow-based image animator and a Latent Motion Diffusion Model (LMDM)
arXiv Detail & Related papers (2023-05-06T09:29:12Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z) - Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred
Objects in Videos [115.71874459429381]
We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video.
Experiments on benchmark datasets demonstrate that our method outperforms previous methods for fast moving object deblurring and 3D reconstruction.
arXiv Detail & Related papers (2021-11-29T11:25:14Z) - Differential Motion Evolution for Fine-Grained Motion Deformation in
Unsupervised Image Animation [41.85199775016731]
We introduce DiME, an end-to-end unsupervised motion transfer framework.
By capturing the motion transfer with an ordinary differential equation (ODE), it helps to regularize the motion field.
We also propose a natural extension to the ODE idea, which is that DiME can easily leverage multiple different views of the source object whenever they are available.
arXiv Detail & Related papers (2021-10-09T22:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.