Related papers: TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting

URL: http://arxiv.org/abs/2003.14401v2
Date: Wed, 1 Apr 2020 02:49:21 GMT
Title: TransMoMo: Invariance-Driven Unsupervised Video Motion Retargeting
Authors: Zhuoqian Yang, Wentao Zhu, Wayne Wu, Chen Qian, Qiang Zhou, Bolei Zhou, Chen Change Loy
Abstract summary: TransMoMo is capable of transferring motion of a person in a source video realistically to another video of a target person. We exploit invariance properties of three factors of variation including motion, structure, and view-angle. We demonstrate the effectiveness of our method over the state-of-the-art methods.
Score: 107.39743751292028
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present a lightweight video motion retargeting approach TransMoMo that is capable of transferring motion of a person in a source video realistically to another video of a target person. Without using any paired data for supervision, the proposed method can be trained in an unsupervised manner by exploiting invariance properties of three orthogonal factors of variation including motion, structure, and view-angle. Specifically, with loss functions carefully derived based on invariance, we train an auto-encoder to disentangle the latent representations of such factors given the source and target video clips. This allows us to selectively transfer motion extracted from the source video seamlessly to the target video in spite of structural and view-angle disparities between the source and the target. The relaxed assumption of paired data allows our method to be trained on a vast amount of videos needless of manual annotation of source-target pairing, leading to improved robustness against large structural variations and extreme motion in videos. We demonstrate the effectiveness of our method over the state-of-the-art methods. Code, model and data are publicly available on our project page (https://yzhq97.github.io/transmomo).

Related papers

LMP: Leveraging Motion Prior in Zero-Shot Video Generation with Diffusion Transformer [10.44905923812975]
We propose the Leveraging Motion Prior (LMP) framework for zero-shot video generation.<n>Our framework harnesses the powerful generative capabilities of pre-trained diffusion transformers to enable motion in the generated videos to reference user-provided motion videos.<n>Our approach achieves state-of-the-art performance in generation quality, prompt-video consistency, and control capability.
arXiv Detail & Related papers (2025-05-20T10:18:29Z)
Direct Motion Models for Assessing Generated Videos [38.04485796547767]
A current limitation of video generative video models is that they generate plausible looking frames, but poor motion. Here we go beyond FVD by developing a metric which better measures plausible object interactions and motion. We show that using point tracks instead of pixel reconstruction or action recognition features results in a metric which is markedly more sensitive to temporal distortions in synthetic data.
arXiv Detail & Related papers (2025-04-30T22:34:52Z)
C-Drag: Chain-of-Thought Driven Motion Controller for Video Generation [81.4106601222722]
Trajectory-based motion control has emerged as an intuitive and efficient approach for controllable video generation. We propose a Chain-of-Thought-based motion controller for controllable video generation, named C-Drag. Our method includes an object perception module and a Chain-of-Thought-based motion reasoning module.
arXiv Detail & Related papers (2025-02-27T08:21:03Z)
MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model [78.11258752076046]
MOFA-Video is an advanced controllable image animation method that generates video from the given image using various additional controllable signals. We design several domain-aware motion field adapters to control the generated motions in the video generation pipeline. After training, the MOFA-Adapters in different domains can also work together for more controllable video generation.
arXiv Detail & Related papers (2024-05-30T16:22:22Z)
Video Diffusion Models are Training-free Motion Interpreter and Controller [20.361790608772157]
This paper introduces a novel perspective to understand, localize, and manipulate motion-aware features in video diffusion models. We present a new MOtion FeaTure (MOFT) by eliminating content correlation information and filtering motion channels.
arXiv Detail & Related papers (2024-05-23T17:59:40Z)
Don't Judge by the Look: Towards Motion Coherent Video Representation [56.09346222721583]
Motion Coherent Augmentation (MCA) is a data augmentation method for video understanding. MCA introduces appearance variation in videos and implicitly encourages the model to prioritize motion patterns, rather than static appearances.
arXiv Detail & Related papers (2024-03-14T15:53:04Z)
VGMShield: Mitigating Misuse of Video Generative Models [7.1819804607793705]
VGMShield is a set of straightforward but effective mitigations through the lifecycle of fake video generation.<n>We start from fake video detection, trying to understand whether there is uniqueness in generated videos.<n>Then, we investigate the fake video source tracing problem, which maps a fake video back to the model that generated it.
arXiv Detail & Related papers (2024-02-20T16:39:23Z)
Motion-Zero: Zero-Shot Moving Object Control Framework for Diffusion-Based Video Generation [10.5019872575418]
We propose a novel zero-shot moving object trajectory control framework, Motion-Zero, to enable a bounding-box-trajectories-controlled text-to-video diffusion model. Our method can be flexibly applied to various state-of-the-art video diffusion models without any training process.
arXiv Detail & Related papers (2024-01-18T17:22:37Z)
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models [58.93124686141781]
Video Motion Customization (VMC) is a novel one-shot tuning approach crafted to adapt temporal attention layers within video diffusion models. Our approach introduces a novel motion distillation objective using residual vectors between consecutive frames as a motion reference. We validate our method against state-of-the-art video generative models across diverse real-world motions and contexts.
arXiv Detail & Related papers (2023-12-01T06:50:11Z)
SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction [93.26613503521664]
This paper presents a short-to-long video diffusion model, SEINE, that focuses on generative transition and prediction. We propose a random-mask video diffusion model to automatically generate transitions based on textual descriptions. Our model generates transition videos that ensure coherence and visual quality.
arXiv Detail & Related papers (2023-10-31T17:58:17Z)
Learn the Force We Can: Enabling Sparse Motion Control in Multi-Object Video Generation [26.292052071093945]
We propose an unsupervised method to generate videos from a single frame and a sparse motion input. Our trained model can generate unseen realistic object-to-object interactions. We show that YODA is on par with or better than state of the art video generation prior work in terms of both controllability and video quality.
arXiv Detail & Related papers (2023-06-06T19:50:02Z)
JOKR: Joint Keypoint Representation for Unsupervised Cross-Domain Motion Retargeting [53.28477676794658]
unsupervised motion in videos has seen substantial advancements through the use of deep neural networks. We introduce JOKR - a JOint Keypoint Representation that handles both the source and target videos, without requiring any object prior or data collection. We evaluate our method both qualitatively and quantitatively, and demonstrate that our method handles various cross-domain scenarios, such as different animals, different flowers, and humans.
arXiv Detail & Related papers (2021-06-17T17:32:32Z)
Motion-Augmented Self-Training for Video Recognition at Smaller Scale [32.73585552425734]
We propose the first motion-augmented self-training regime, we call MotionFit. We generate pseudo-labels for a large unlabeled video collection, which enables us to transfer knowledge by learning to predict these pseudo-labels with an appearance model. We obtain a strong motion-augmented representation model suited for video downstream tasks like action recognition and clip retrieval.
arXiv Detail & Related papers (2021-05-04T17:43:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.