SportsCap: Monocular 3D Human Motion Capture and Fine-grained
Understanding in Challenging Sports Videos
- URL: http://arxiv.org/abs/2104.11452v2
- Date: Mon, 26 Apr 2021 14:06:25 GMT
- Title: SportsCap: Monocular 3D Human Motion Capture and Fine-grained
Understanding in Challenging Sports Videos
- Authors: Xin Chen, Anqi Pang, Wei Yang, Yuexin Ma, Lan Xu, Jingyi Yu
- Abstract summary: We propose SportsCap -- the first approach for simultaneously capturing 3D human motions and understanding fine-grained actions from monocular challenging sports video input.
Our approach utilizes the semantic and temporally structured sub-motion prior in the embedding space for motion capture and understanding.
Based on such hybrid motion information, we introduce a multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict the fine-grained semantic action attributes.
- Score: 40.19723456533343
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Markerless motion capture and understanding of professional non-daily human
movements is an important yet unsolved task, which suffers from complex motion
patterns and severe self-occlusion, especially for the monocular setting. In
this paper, we propose SportsCap -- the first approach for simultaneously
capturing 3D human motions and understanding fine-grained actions from
monocular challenging sports video input. Our approach utilizes the semantic
and temporally structured sub-motion prior in the embedding space for motion
capture and understanding in a data-driven multi-task manner. To enable robust
capture under complex motion patterns, we propose an effective motion embedding
module to recover both the implicit motion embedding and explicit 3D motion
details via a corresponding mapping function as well as a sub-motion
classifier. Based on such hybrid motion information, we introduce a
multi-stream spatial-temporal Graph Convolutional Network(ST-GCN) to predict
the fine-grained semantic action attributes, and adopt a semantic attribute
mapping block to assemble various correlated action attributes into a
high-level action label for the overall detailed understanding of the whole
sequence, so as to enable various applications like action assessment or motion
scoring. Comprehensive experiments on both public and our proposed datasets
show that with a challenging monocular sports video input, our novel approach
not only significantly improves the accuracy of 3D human motion capture, but
also recovers accurate fine-grained semantic action attributes.
Related papers
- Sitcom-Crafter: A Plot-Driven Human Motion Generation System in 3D Scenes [83.55301458112672]
Sitcom-Crafter is a system for human motion generation in 3D space.
Central to the function generation modules is our novel 3D scene-aware human-human interaction module.
Augmentation modules encompass plot comprehension for command generation, motion synchronization for seamless integration of different motion types.
arXiv Detail & Related papers (2024-10-14T17:56:19Z) - Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - MotionLLM: Understanding Human Behaviors from Human Motions and Videos [40.132643319573205]
This study delves into the realm of multi-modality (i.e., video and motion modalities) human behavior understanding.
We present MotionLLM, a framework for human motion understanding, captioning, and reasoning.
arXiv Detail & Related papers (2024-05-30T17:59:50Z) - DiverseMotion: Towards Diverse Human Motion Generation via Discrete
Diffusion [70.33381660741861]
We present DiverseMotion, a new approach for synthesizing high-quality human motions conditioned on textual descriptions.
We show that our DiverseMotion achieves the state-of-the-art motion quality and competitive motion diversity.
arXiv Detail & Related papers (2023-09-04T05:43:48Z) - Action Recognition with Multi-stream Motion Modeling and Mutual
Information Maximization [44.73161606369333]
Action recognition is a fundamental and intriguing problem in artificial intelligence.
We introduce a novel Stream-GCN network equipped with multi-stream components and channel attention.
Our approach sets the new state-of-the-art performance on three benchmark datasets.
arXiv Detail & Related papers (2023-06-13T06:56:09Z) - MotionBERT: A Unified Perspective on Learning Human Motion
Representations [46.67364057245364]
We present a unified perspective on tackling various human-centric video tasks by learning human motion representations from large-scale and heterogeneous data resources.
We propose a pretraining stage in which a motion encoder is trained to recover the underlying 3D motion from noisy partial 2D observations.
We implement motion encoder with a Dual-stream Spatio-temporal Transformer (DSTformer) neural network.
arXiv Detail & Related papers (2022-10-12T19:46:25Z) - MoCaNet: Motion Retargeting in-the-wild via Canonicalization Networks [77.56526918859345]
We present a novel framework that brings the 3D motion task from controlled environments to in-the-wild scenarios.
It is capable of body motion from a character in a 2D monocular video to a 3D character without using any motion capture system or 3D reconstruction procedure.
arXiv Detail & Related papers (2021-12-19T07:52:05Z) - AMP: Adversarial Motion Priors for Stylized Physics-Based Character
Control [145.61135774698002]
We propose a fully automated approach to selecting motion for a character to track in a given scenario.
High-level task objectives that the character should perform can be specified by relatively simple reward functions.
Low-level style of the character's behaviors can be specified by a dataset of unstructured motion clips.
Our system produces high-quality motions comparable to those achieved by state-of-the-art tracking-based techniques.
arXiv Detail & Related papers (2021-04-05T22:43:14Z) - ChallenCap: Monocular 3D Capture of Challenging Human Performances using
Multi-Modal References [18.327101908143113]
We propose ChallenCap -- a template-based approach to capture challenging 3D human motions using a single RGB camera.
We adopt a novel learning-and-optimization framework, with the aid of multi-modal references.
Experiments on our new challenging motion dataset demonstrate the effectiveness and robustness of our approach to capture challenging human motions.
arXiv Detail & Related papers (2021-03-11T15:49:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.