ChallenCap: Monocular 3D Capture of Challenging Human Performances using
Multi-Modal References
- URL: http://arxiv.org/abs/2103.06747v1
- Date: Thu, 11 Mar 2021 15:49:22 GMT
- Title: ChallenCap: Monocular 3D Capture of Challenging Human Performances using
Multi-Modal References
- Authors: Yannan He, Anqi Pang, Xin Chen, Han Liang, Minye Wu, Yuexin Ma, Lan Xu
- Abstract summary: We propose ChallenCap -- a template-based approach to capture challenging 3D human motions using a single RGB camera.
We adopt a novel learning-and-optimization framework, with the aid of multi-modal references.
Experiments on our new challenging motion dataset demonstrate the effectiveness and robustness of our approach to capture challenging human motions.
- Score: 18.327101908143113
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Capturing challenging human motions is critical for numerous applications,
but it suffers from complex motion patterns and severe self-occlusion under the
monocular setting. In this paper, we propose ChallenCap -- a template-based
approach to capture challenging 3D human motions using a single RGB camera in a
novel learning-and-optimization framework, with the aid of multi-modal
references. We propose a hybrid motion inference stage with a generation
network, which utilizes a temporal encoder-decoder to extract the motion
details from the pair-wise sparse-view reference, as well as a motion
discriminator to utilize the unpaired marker-based references to extract
specific challenging motion characteristics in a data-driven manner. We further
adopt a robust motion optimization stage to increase the tracking accuracy, by
jointly utilizing the learned motion details from the supervised multi-modal
references as well as the reliable motion hints from the input image reference.
Extensive experiments on our new challenging motion dataset demonstrate the
effectiveness and robustness of our approach to capture challenging human
motions.
Related papers
- Exploring Vision Transformers for 3D Human Motion-Language Models with Motion Patches [12.221087476416056]
We introduce "motion patches", a new representation of motion sequences, and propose using Vision Transformers (ViT) as motion encoders via transfer learning.
These motion patches, created by dividing and sorting skeleton joints based on motion sequences, are robust to varying skeleton structures.
We find that transfer learning with pre-trained weights of ViT obtained through training with 2D image data can boost the performance of motion analysis.
arXiv Detail & Related papers (2024-05-08T02:42:27Z) - MOVIN: Real-time Motion Capture using a Single LiDAR [7.3228874258537875]
We present MOVIN, the data-driven generative method for real-time motion capture with global tracking.
Our framework accurately predicts the performer's 3D global information and local joint details.
We implement a real-time application to showcase our method in real-world scenarios.
arXiv Detail & Related papers (2023-09-17T16:04:15Z) - MotionTrack: Learning Motion Predictor for Multiple Object Tracking [68.68339102749358]
We introduce a novel motion-based tracker, MotionTrack, centered around a learnable motion predictor.
Our experimental results demonstrate that MotionTrack yields state-of-the-art performance on datasets such as Dancetrack and SportsMOT.
arXiv Detail & Related papers (2023-06-05T04:24:11Z) - Mutual Information-Based Temporal Difference Learning for Human Pose
Estimation in Video [16.32910684198013]
We present a novel multi-frame human pose estimation framework, which employs temporal differences across frames to model dynamic contexts.
To be specific, we design a multi-stage entangled learning sequences conditioned on multi-stage differences to derive informative motion representation sequences.
These place us to rank No.1 in the Crowd Pose Estimation in Complex Events Challenge on benchmark HiEve.
arXiv Detail & Related papers (2023-03-15T09:29:03Z) - Modeling Continuous Motion for 3D Point Cloud Object Tracking [54.48716096286417]
This paper presents a novel approach that views each tracklet as a continuous stream.
At each timestamp, only the current frame is fed into the network to interact with multi-frame historical features stored in a memory bank.
To enhance the utilization of multi-frame features for robust tracking, a contrastive sequence enhancement strategy is proposed.
arXiv Detail & Related papers (2023-03-14T02:58:27Z) - Multi-Scale Control Signal-Aware Transformer for Motion Synthesis
without Phase [72.01862340497314]
We propose a task-agnostic deep learning method, namely Multi-scale Control Signal-aware Transformer (MCS-T)
MCS-T is able to successfully generate motions comparable to those generated by the methods using auxiliary information.
arXiv Detail & Related papers (2023-03-03T02:56:44Z) - Learning Variational Motion Prior for Video-based Motion Capture [31.79649766268877]
We present a novel variational motion prior (VMP) learning approach for video-based motion capture.
Our framework can effectively reduce temporal jittering and failure modes in frame-wise pose estimation.
Experiments over both public datasets and in-the-wild videos have demonstrated the efficacy and generalization capability of our framework.
arXiv Detail & Related papers (2022-10-27T02:45:48Z) - Animation from Blur: Multi-modal Blur Decomposition with Motion Guidance [83.25826307000717]
We study the challenging problem of recovering detailed motion from a single motion-red image.
Existing solutions to this problem estimate a single image sequence without considering the motion ambiguity for each region.
In this paper, we explicitly account for such motion ambiguity, allowing us to generate multiple plausible solutions all in sharp detail.
arXiv Detail & Related papers (2022-07-20T18:05:53Z) - DeepMultiCap: Performance Capture of Multiple Characters Using Sparse
Multiview Cameras [63.186486240525554]
DeepMultiCap is a novel method for multi-person performance capture using sparse multi-view cameras.
Our method can capture time varying surface details without the need of using pre-scanned template models.
arXiv Detail & Related papers (2021-05-01T14:32:13Z) - SportsCap: Monocular 3D Human Motion Capture and Fine-grained
Understanding in Challenging Sports Videos [40.19723456533343]
We propose SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input.
Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding.
Based on such hybrid motion information, we introduce a multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict the fine-grained semantic action attributes.
arXiv Detail & Related papers (2021-04-23T07:52:03Z) - AMP: Adversarial Motion Priors for Stylized Physics-Based Character
Control [145.61135774698002]
We propose a fully automated approach to selecting motion for a character to track in a given scenario.
High-level task objectives that the character should perform can be specified by relatively simple reward functions.
Low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips.
Our system produces high-quality motions comparable to those achieved by state-of-the-art tracking-based techniques.
arXiv Detail & Related papers (2021-04-05T22:43:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.