TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
- URL: http://arxiv.org/abs/2003.14401v2
- Date: Wed, 1 Apr 2020 02:49:21 GMT
- Title: TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
- Authors: Zhuoqian Yang, Wentao Zhu, Wayne Wu, Chen Qian, Qiang Zhou, Bolei
Zhou, Chen Change Loy
- Abstract summary: TransMoMo is capable of transferring motion of a person in a source video realistically to another video of a target person.
We exploit invariance properties of three factors of variation including motion, structure, and view-angle.
We demonstrate the effectiveness of our method over the state-of-the-art methods.
- Score: 107.39743751292028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a lightweight video motion retargeting approach TransMoMo that is
capable of transferring motion of a person in a source video realistically to
another video of a target person. Without using any paired data for
supervision, the proposed method can be trained in an unsupervised manner by
exploiting invariance properties of three orthogonal factors of variation
including motion, structure, and view-angle. Specifically, with loss functions
carefully derived based on invariance, we train an auto-encoder to disentangle
the latent representations of such factors given the source and target video
clips. This allows us to selectively transfer motion extracted from the source
video seamlessly to the target video in spite of structural and view-angle
disparities between the source and the target. The relaxed assumption of paired
data allows our method to be trained on a vast amount of videos needless of
manual annotation of source-target pairing, leading to improved robustness
against large structural variations and extreme motion in videos. We
demonstrate the effectiveness of our method over the state-of-the-art methods.
Code, model and data are publicly available on our project page
(https://yzhq97.github.io/transmomo).
Related papers
- MotionMatcher: Motion Customization of Text-to-Video Diffusion Models via Motion Feature Matching [27.28898943916193]
Text-to-video (T2V) diffusion models have promising capabilities in synthesizing realistic videos from input text prompts.
In this work, we tackle the motion customization problem, where a reference video is provided as motion guidance.
We propose MotionMatcher, a motion customization framework that fine-tunes the pre-trained T2V diffusion model at the feature level.
arXiv Detail & Related papers (2025-02-18T19:12:51Z) - VideoAnydoor: High-fidelity Video Object Insertion with Precise Motion Control [66.66226299852559]
VideoAnydoor is a zero-shot video object insertion framework with high-fidelity detail preservation and precise motion control.
To preserve the detailed appearance and meanwhile support fine-grained motion control, we design a pixel warper.
arXiv Detail & Related papers (2025-01-02T18:59:54Z) - MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [78.11258752076046]
MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals.
We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline.
After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
arXiv Detail & Related papers (2024-05-30T16:22:22Z) - Don't Judge by the Look: Towards Motion Coherent Video Representation [56.09346222721583]
Motion Coherent Augmentation (MCA) is a data augmentation method for video understanding.
MCA introduces appearance variation in videos and implicitly encourages the model to prioritize motion patterns, rather than static appearances.
arXiv Detail & Related papers (2024-03-14T15:53:04Z) - Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation [10.5019872575418]
We propose a novel zero-shot moving object trajectory control framework, Motion-Zero, to enable a bounding-box-trajectories-controlled text-to-video diffusion model.
Our method can be flexibly applied to various state-of-the-art video diffusion models without any training process.
arXiv Detail & Related papers (2024-01-18T17:22:37Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object
Video Generation [26.292052071093945]
We propose an unsupervised method to generate videos from a single frame and a sparse motion input.
Our trained model can generate unseen realistic object-to-object interactions.
We show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.
arXiv Detail & Related papers (2023-06-06T19:50:02Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.