TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
- URL: http://arxiv.org/abs/2003.14401v2
- Date: Wed, 1 Apr 2020 02:49:21 GMT
- Title: TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
- Authors: Zhuoqian Yang, Wentao Zhu, Wayne Wu, Chen Qian, Qiang Zhou, Bolei
Zhou, Chen Change Loy
- Abstract summary: TransMoMo is capable of transferring motion of a person in a source video realistically to another video of a target person.
We exploit invariance properties of three factors of variation including motion, structure, and view-angle.
We demonstrate the effectiveness of our method over the state-of-the-art methods.
- Score: 107.39743751292028
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a lightweight video motion retargeting approach TransMoMo that is
capable of transferring motion of a person in a source video realistically to
another video of a target person. Without using any paired data for
supervision, the proposed method can be trained in an unsupervised manner by
exploiting invariance properties of three orthogonal factors of variation
including motion, structure, and view-angle. Specifically, with loss functions
carefully derived based on invariance, we train an auto-encoder to disentangle
the latent representations of such factors given the source and target video
clips. This allows us to selectively transfer motion extracted from the source
video seamlessly to the target video in spite of structural and view-angle
disparities between the source and the target. The relaxed assumption of paired
data allows our method to be trained on a vast amount of videos needless of
manual annotation of source-target pairing, leading to improved robustness
against large structural variations and extreme motion in videos. We
demonstrate the effectiveness of our method over the state-of-the-art methods.
Code, model and data are publicly available on our project page
(https://yzhq97.github.io/transmomo).
Related papers
- MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [78.11258752076046]
MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals.
We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline.
After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
arXiv Detail & Related papers (2024-05-30T16:22:22Z) - Video Diffusion Models are Training-free Motion Interpreter and Controller [20.361790608772157]
This paper introduces a novel perspective to understand, localize, and manipulate motion-aware features in video diffusion models.
We present a new MOtion FeaTure (MOFT) by eliminating content correlation information and filtering motion channels.
arXiv Detail & Related papers (2024-05-23T17:59:40Z) - Don't Judge by the Look: Towards Motion Coherent Video Representation [56.09346222721583]
Motion Coherent Augmentation (MCA) is a data augmentation method for video understanding.
MCA introduces appearance variation in videos and implicitly encourages the model to prioritize motion patterns, rather than static appearances.
arXiv Detail & Related papers (2024-03-14T15:53:04Z) - VMC: Video Motion Customization using Temporal Attention Adaption for
Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models.
Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference.
We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z) - SEINE: Short-to-Long Video Diffusion Model for Generative Transition and
Prediction [93.26613503521664]
This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction.
We propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions.
Our model generates transition videos that ensure coherence and visual quality.
arXiv Detail & Related papers (2023-10-31T17:58:17Z) - Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object
Video Generation [26.292052071093945]
We propose an unsupervised method to generate videos from a single frame and a sparse motion input.
Our trained model can generate unseen realistic object-to-object interactions.
We show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.
arXiv Detail & Related papers (2023-06-06T19:50:02Z) - JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion
Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks.
We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection.
We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z) - Motion-Augmented Self-Training for Video Recognition at Smaller Scale [32.73585552425734]
We propose the first motion-augmented self-training regime, we call MotionFit.
We generate pseudo-labels for a large unlabeled video collection, which enables us to transfer knowledge by learning to predict these pseudo-labels with an appearance model.
We obtain a strong motion-augmented representation model suited for video downstream tasks like action recognition and clip retrieval.
arXiv Detail & Related papers (2021-05-04T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.