Graph and Temporal Convolutional Networks for 3D Multi-person Pose
Estimation in Monocular Videos
- URL: http://arxiv.org/abs/2012.11806v3
- Date: Wed, 7 Apr 2021 06:26:48 GMT
- Title: Graph and Temporal Convolutional Networks for 3D Multi-person Pose
Estimation in Monocular Videos
- Authors: Yu Cheng, Bo Wang, Bo Yang, Robby T. Tan
- Abstract summary: We propose a novel framework integrating graph convolutional networks (GCNs) and temporal convolutional networks (TCNs) to robustly estimate camera-centric multi-person 3D poses.
In particular, we introduce a human-joint GCN, which employs the 2D pose estimator's confidence scores to improve the pose estimation results.
The two GCNs work together to estimate the spatial frame-wise 3D poses and can make use of both visible joint and bone information in the target frame to estimate the occluded or missing human-part information.
- Score: 33.974241749058585
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the recent progress, 3D multi-person pose estimation from monocular
videos is still challenging due to the commonly encountered problem of missing
information caused by occlusion, partially out-of-frame target persons, and
inaccurate person detection. To tackle this problem, we propose a novel
framework integrating graph convolutional networks (GCNs) and temporal
convolutional networks (TCNs) to robustly estimate camera-centric multi-person
3D poses that do not require camera parameters. In particular, we introduce a
human-joint GCN, which, unlike the existing GCN, is based on a directed graph
that employs the 2D pose estimator's confidence scores to improve the pose
estimation results. We also introduce a human-bone GCN, which models the bone
connections and provides more information beyond human joints. The two GCNs
work together to estimate the spatial frame-wise 3D poses and can make use of
both visible joint and bone information in the target frame to estimate the
occluded or missing human-part information. To further refine the 3D pose
estimation, we use our temporal convolutional networks (TCNs) to enforce the
temporal and human-dynamics constraints. We use a joint-TCN to estimate
person-centric 3D poses across frames, and propose a velocity-TCN to estimate
the speed of 3D joints to ensure the consistency of the 3D pose estimation in
consecutive frames. Finally, to estimate the 3D human poses for multiple
persons, we propose a root-TCN that estimates camera-centric 3D poses without
requiring camera parameters. Quantitative and qualitative evaluations
demonstrate the effectiveness of the proposed method.
Related papers
- UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues [55.69339788566899]
UPose3D is a novel approach for multi-view 3D human pose estimation.
It improves robustness and flexibility without requiring direct 3D annotations.
arXiv Detail & Related papers (2024-04-23T00:18:00Z) - Towards Precise 3D Human Pose Estimation with Multi-Perspective Spatial-Temporal Relational Transformers [28.38686299271394]
We propose a framework for 3D sequence-to-sequence (seq2seq) human pose detection.
Firstly, the spatial module represents the human pose feature by intra-image content, while the frame-image relation module extracts temporal relationships.
Our method is evaluated on Human3.6M, a popular 3D human pose detection dataset.
arXiv Detail & Related papers (2024-01-30T03:00:25Z) - PONet: Robust 3D Human Pose Estimation via Learning Orientations Only [116.1502793612437]
We propose a novel Pose Orientation Net (PONet) that is able to robustly estimate 3D pose by learning orientations only.
PONet estimates the 3D orientation of these limbs by taking advantage of the local image evidence to recover the 3D pose.
We evaluate our method on multiple datasets, including Human3.6M, MPII, MPI-INF-3DHP, and 3DPW.
arXiv Detail & Related papers (2021-12-21T12:48:48Z) - Learning Temporal 3D Human Pose Estimation with Pseudo-Labels [3.0954251281114513]
We present a simple, yet effective, approach for self-supervised 3D human pose estimation.
We rely on triangulating 2D body pose estimates of a multiple-view camera system.
Our method achieves state-of-the-art performance in the Human3.6M and MPI-INF-3DHP benchmarks.
arXiv Detail & Related papers (2021-10-14T17:40:45Z) - Learning Dynamical Human-Joint Affinity for 3D Pose Estimation in Videos [47.601288796052714]
Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos.
New Dynamical Graph Network (DGNet) can estimate 3D pose by adaptively learning spatial/temporal joint relations from videos.
arXiv Detail & Related papers (2021-09-15T15:06:19Z) - Self-Attentive 3D Human Pose and Shape Estimation from Videos [82.63503361008607]
We present a video-based learning algorithm for 3D human pose and shape estimation.
We exploit temporal information in videos and propose a self-attention module.
We evaluate our method on the 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
arXiv Detail & Related papers (2021-03-26T00:02:19Z) - Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose
Estimation [18.103595280706593]
We leverage recent advances in reliable 2D pose estimation with CNN to estimate the 3D pose of people from depth images.
Our approach achieves very competitive results both in accuracy and speed on two public datasets.
arXiv Detail & Related papers (2020-11-10T10:08:13Z) - Multi-Scale Networks for 3D Human Pose Estimation with Inference Stage
Optimization [33.02708860641971]
Estimating 3D human poses from a monocular video is still a challenging task.
Many existing methods drop when the target person is cluded by other objects, or the motion is too fast/slow relative to the scale and speed of the training data.
We introduce atemporal-temporal network for robust 3D human pose estimation.
arXiv Detail & Related papers (2020-10-13T15:24:28Z) - Kinematic-Structure-Preserved Representation for Unsupervised 3D Human
Pose Estimation [58.72192168935338]
Generalizability of human pose estimation models developed using supervision on large-scale in-studio datasets remains questionable.
We propose a novel kinematic-structure-preserved unsupervised 3D pose estimation framework, which is not restrained by any paired or unpaired weak supervisions.
Our proposed model employs three consecutive differentiable transformations named as forward-kinematics, camera-projection and spatial-map transformation.
arXiv Detail & Related papers (2020-06-24T23:56:33Z) - Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image
Synthesis [72.34794624243281]
We propose a self-supervised learning framework to disentangle variations from unlabeled video frames.
Our differentiable formalization, bridging the representation gap between the 3D pose and spatial part maps, allows us to operate on videos with diverse camera movements.
arXiv Detail & Related papers (2020-04-09T07:55:01Z) - Anatomy-aware 3D Human Pose Estimation with Bone-based Pose
Decomposition [92.99291528676021]
Instead of directly regressing the 3D joint locations, we decompose the task into bone direction prediction and bone length prediction.
Our motivation is the fact that the bone lengths of a human skeleton remain consistent across time.
Our full model outperforms the previous best results on Human3.6M and MPI-INF-3DHP datasets.
arXiv Detail & Related papers (2020-02-24T15:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.