Learning Variational Motion Prior for Video-based Motion Capture
- URL: http://arxiv.org/abs/2210.15134v2
- Date: Fri, 28 Oct 2022 02:32:05 GMT
- Title: Learning Variational Motion Prior for Video-based Motion Capture
- Authors: Xin Chen, Zhuo Su, Lingbo Yang, Pei Cheng, Lan Xu, Bin Fu, and Gang Yu
- Abstract summary: We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
- Score: 31.79649766268877
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Motion capture from a monocular video is fundamental and crucial for us
humans to naturally experience and interact with each other in Virtual Reality
(VR) and Augmented Reality (AR). However, existing methods still struggle with
challenging cases involving self-occlusion and complex poses due to the lack of
effective motion prior modeling. In this paper, we present a novel variational
motion prior (VMP) learning approach for video-based motion capture to resolve
the above issue. Instead of directly building the correspondence between the
video and motion domain, We propose to learn a generic latent space for
capturing the prior distribution of all natural motions, which serve as the
basis for subsequent video-based motion capture tasks. To improve the
generalization capacity of prior space, we propose a transformer-based
variational autoencoder pretrained over marker-based 3D mocap data, with a
novel style-mapping block to boost the generation quality. Afterward, a
separate video encoder is attached to the pretrained motion generator for
end-to-end fine-tuning over task-specific video datasets. Compared to existing
motion prior models, our VMP model serves as a motion rectifier that can
effectively reduce temporal jittering and failure modes in frame-wise pose
estimation, leading to temporally stable and visually realistic motion capture
results. Furthermore, our VMP-based framework models motion at sequence level
and can directly generate motion clips in the forward pass, achieving real-time
motion capture during inference. Extensive experiments over both public
datasets and in-the-wild videos have demonstrated the efficacy and
generalization capability of our framework.
Related papers
- VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models [71.9811050853964]
VideoJAM is a novel framework that instills an effective motion prior to video generators.
VideoJAM achieves state-of-the-art performance in motion coherence.
These findings emphasize that appearance and motion can be complementary and, when effectively integrated, enhance both the visual quality and the coherence of video generation.
arXiv Detail & Related papers (2025-02-04T17:07:10Z) - Separate Motion from Appearance: Customizing Motion via Customizing Text-to-Video Diffusion Models [18.41701130228042]
Motion customization aims to adapt the diffusion model (DM) to generate videos with the motion specified by a set of video clips with the same motion concept.
This paper proposes two novel strategies to enhance motion-appearance separation, including temporal attention purification (TAP) and appearance highway (AH)
arXiv Detail & Related papers (2025-01-28T05:40:20Z) - A Plug-and-Play Physical Motion Restoration Approach for In-the-Wild High-Difficulty Motions [56.709280823844374]
We introduce a mask-based motion correction module (MCM) that leverages motion context and video mask to repair flawed motions.
We also propose a physics-based motion transfer module (PTM), which employs a pretrain and adapt approach for motion imitation.
Our approach is designed as a plug-and-play module to physically refine the video motion capture results, including high-difficulty in-the-wild motions.
arXiv Detail & Related papers (2024-12-23T08:26:00Z) - MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models [59.10171699717122]
MoTrans is a customized motion transfer method enabling video generation of similar motion in new context.
multimodal representations from recaptioned prompt and video frames promote the modeling of appearance.
Our method effectively learns specific motion pattern from singular or multiple reference videos.
arXiv Detail & Related papers (2024-12-02T10:07:59Z) - Generalizable Implicit Motion Modeling for Video Frame Interpolation [51.966062283735596]
Motion is critical in flow-based Video Frame Interpolation (VFI)
We introduce General Implicit Motion Modeling (IMM), a novel and effective approach to motion modeling VFI.
Our GIMM can be easily integrated with existing flow-based VFI works by supplying accurately modeled motion.
arXiv Detail & Related papers (2024-07-11T17:13:15Z) - Image Conductor: Precision Control for Interactive Video Synthesis [90.2353794019393]
Filmmaking and animation production often require sophisticated techniques for coordinating camera transitions and object movements.
Image Conductor is a method for precise control of camera transitions and object movements to generate video assets from a single image.
arXiv Detail & Related papers (2024-06-21T17:55:05Z) - Motion Inversion for Video Customization [31.607669029754874]
We present a novel approach for motion in generation, addressing the widespread gap in the exploration of motion representation within video models.
We introduce Motion Embeddings, a set of explicit, temporally coherent embeddings derived from given video.
Our contributions include a tailored motion embedding for customization tasks and a demonstration of the practical advantages and effectiveness of our method.
arXiv Detail & Related papers (2024-03-29T14:14:22Z) - Spectral Motion Alignment for Video Motion Transfer using Diffusion Models [54.32923808964701]
Spectral Motion Alignment (SMA) is a framework that refines and aligns motion vectors using Fourier and wavelet transforms.
SMA learns motion patterns by incorporating frequency-domain regularization, facilitating the learning of whole-frame global motion dynamics.
Extensive experiments demonstrate SMA's efficacy in improving motion transfer while maintaining computational efficiency and compatibility across various video customization frameworks.
arXiv Detail & Related papers (2024-03-22T14:47:18Z) - Traffic Video Object Detection using Motion Prior [16.63738085066699]
We propose two innovative methods to exploit the motion prior and boost the performance of traffic video object detection.
Firstly, we introduce a new self-attention module that leverages the motion prior to guide temporal information integration.
Secondly, we utilise a pseudo-labelling mechanism to eliminate noisy pseudo labels for the semi-supervised setting.
arXiv Detail & Related papers (2023-11-16T18:59:46Z) - Motion-DVAE: Unsupervised learning for fast human motion denoising [18.432026846779372]
We introduce Motion-DVAE, a motion prior to capture the short-term dependencies of human motion.
Together with Motion-DVAE, we introduce an unsupervised learned denoising method unifying regression- and optimization-based approaches.
arXiv Detail & Related papers (2023-06-09T12:18:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.