Learning Variational Motion Prior for Video-based Motion Capture
- URL: http://arxiv.org/abs/2210.15134v2
- Date: Fri, 28 Oct 2022 02:32:05 GMT
- Title: Learning Variational Motion Prior for Video-based Motion Capture
- Authors: Xin Chen, Zhuo Su, Lingbo Yang, Pei Cheng, Lan Xu, Bin Fu, and Gang Yu
- Abstract summary: We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
- Score: 31.79649766268877
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Motion capture from a monocular video is fundamental and crucial for us
humans to naturally experience and interact with each other in Virtual Reality
(VR) and Augmented Reality (AR). However, existing methods still struggle with
challenging cases involving self-occlusion and complex poses due to the lack of
effective motion prior modeling. In this paper, we present a novel variational
motion prior (VMP) learning approach for video-based motion capture to resolve
the above issue. Instead of directly building the correspondence between the
video and motion domain, We propose to learn a generic latent space for
capturing the prior distribution of all natural motions, which serve as the
basis for subsequent video-based motion capture tasks. To improve the
generalization capacity of prior space, we propose a transformer-based
variational autoencoder pretrained over marker-based 3D mocap data, with a
novel style-mapping block to boost the generation quality. Afterward, a
separate video encoder is attached to the pretrained motion generator for
end-to-end fine-tuning over task-specific video datasets. Compared to existing
motion prior models, our VMP model serves as a motion rectifier that can
effectively reduce temporal jittering and failure modes in frame-wise pose
estimation, leading to temporally stable and visually realistic motion capture
results. Furthermore, our VMP-based framework models motion at sequence level
and can directly generate motion clips in the forward pass, achieving real-time
motion capture during inference. Extensive experiments over both public
datasets and in-the-wild videos have demonstrated the efficacy and
generalization capability of our framework.
Related papers
- E-Motion: Future Motion Simulation via Event Sequence Diffusion [86.80533612211502]
Event-based sensors may potentially offer a unique opportunity to predict future motion with a level of detail and precision previously unachievable.
We propose to integrate the strong learning capacity of the video diffusion model with the rich motion information of an event camera as a motion simulation framework.
Our findings suggest a promising direction for future research in enhancing the interpretative power and predictive accuracy of computer vision systems.
arXiv Detail & Related papers (2024-10-11T09:19:23Z) - Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI)
We introduce General Implicit Motion Modeling (IMM), a novel and effective approach to motion modeling VFI.
Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion.
arXiv Detail & Related papers (2024-07-11T17:13:15Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - MotionClone: Training-Free Motion Cloning for Controllable Video Generation [41.621147782128396]
MotionClone is a training-free framework that enables motion cloning from reference videos to versatile motion-controlled video generation.
MotionClone exhibits proficiency in both global camera motion and local object motion, with notable superiority in terms of motion fidelity, textual alignment, and temporal consistency.
arXiv Detail & Related papers (2024-06-08T03:44:25Z) - MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion [94.66090422753126]
MotionFollower is a lightweight score-guided diffusion model for video motion editing.
It delivers superior motion editing performance and exclusively supports large camera movements and actions.
Compared with MotionEditor, the most advanced motion editing model, MotionFollower achieves an approximately 80% reduction in GPU memory.
arXiv Detail & Related papers (2024-05-30T17:57:30Z) - Motion Inversion for Video Customization [31.607669029754874]
We present a novel approach for motion in generation, addressing the widespread gap in the exploration of motion representation within video models.
We introduce Motion Embeddings, a set of explicit, temporally coherent embeddings derived from given video.
Our contributions include a tailored motion embedding for customization tasks and a demonstration of the practical advantages and effectiveness of our method.
arXiv Detail & Related papers (2024-03-29T14:14:22Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Traffic Video Object Detection using Motion Prior [16.63738085066699]
We propose two innovative methods to exploit the motion prior and boost the performance of traffic video object detection.
Firstly, we introduce a new self-attention module that leverages the motion prior to guide temporal information integration.
Secondly, we utilise a pseudo-labelling mechanism to eliminate noisy pseudo labels for the semi-supervised setting.
arXiv Detail & Related papers (2023-11-16T18:59:46Z) - Fine-Grained Spatiotemporal Motion Alignment for Contrastive Video Representation Learning [16.094271750354835]
Motion information is critical to a robust and generalized video representation.
Recent works have adopted frame difference as the source of motion information in video contrastive learning.
We present a framework capable of introducing well-aligned and significant motion information.
arXiv Detail & Related papers (2023-09-01T07:03:27Z) - Motion-DVAE: Unsupervised learning for fast human motion denoising [18.432026846779372]
We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion.
Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches.
arXiv Detail & Related papers (2023-06-09T12:18:48Z) - AMP: Adversarial Motion Priors for Stylized Physics-Based Character
Control [145.61135774698002]
We propose a fully automated approach to selecting motion for a character to track in a given scenario.
High-level task objectives that the character should perform can be specified by relatively simple reward functions.
Low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips.
Our system produces high-quality motions comparable to those achieved by state-of-the-art tracking-based techniques.
arXiv Detail & Related papers (2021-04-05T22:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.