Mutual Adaptive Reasoning for Monocular 3D Multi-Person Pose Estimation
- URL: http://arxiv.org/abs/2207.07900v1
- Date: Sat, 16 Jul 2022 10:54:40 GMT
- Title: Mutual Adaptive Reasoning for Monocular 3D Multi-Person Pose Estimation
- Authors: Juze Zhang, Jingya Wang, Ye Shi, Fei Gao, Lan Xu, Jingyi Yu
- Abstract summary: Most existing bottom-up methods treat camera-centric 3D human pose estimation as two unrelated subtasks.
We propose a unified model that leverages the mutual benefits of both these subtasks.
Our model runs much faster than existing bottom-up and top-down methods.
- Score: 45.06447187321217
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Inter-person occlusion and depth ambiguity make estimating the 3D poses of
monocular multiple persons as camera-centric coordinates a challenging problem.
Typical top-down frameworks suffer from high computational redundancy with an
additional detection stage. By contrast, the bottom-up methods enjoy low
computational costs as they are less affected by the number of humans. However,
most existing bottom-up methods treat camera-centric 3D human pose estimation
as two unrelated subtasks: 2.5D pose estimation and camera-centric depth
estimation. In this paper, we propose a unified model that leverages the mutual
benefits of both these subtasks. Within the framework, a robust structured 2.5D
pose estimation is designed to recognize inter-person occlusion based on depth
relationships. Additionally, we develop an end-to-end geometry-aware depth
reasoning method that exploits the mutual benefits of both 2.5D pose and
camera-centric root depths. This method first uses 2.5D pose and geometry
information to infer camera-centric root depths in a forward pass, and then
exploits the root depths to further improve representation learning of 2.5D
pose estimation in a backward pass. Further, we designed an adaptive fusion
scheme that leverages both visual perception and body geometry to alleviate
inherent depth ambiguity issues. Extensive experiments demonstrate the
superiority of our proposed model over a wide range of bottom-up methods. Our
accuracy is even competitive with top-down counterparts. Notably, our model
runs much faster than existing bottom-up and top-down methods.
Related papers
- DiffuPose: Monocular 3D Human Pose Estimation via Denoising Diffusion
Probabilistic Model [25.223801390996435]
This paper focuses on reconstructing a 3D pose from a single 2D keypoint detection.
We build a novel diffusion-based framework to effectively sample diverse 3D poses from an off-the-shelf 2D detector.
We evaluate our method on the widely adopted Human3.6M and HumanEva-I datasets.
arXiv Detail & Related papers (2022-12-06T07:22:20Z) - Monocular 3D Object Detection with Depth from Motion [74.29588921594853]
We take advantage of camera ego-motion for accurate object depth estimation and detection.
Our framework, named Depth from Motion (DfM), then uses the established geometry to lift 2D image features to the 3D space and detects 3D objects thereon.
Our framework outperforms state-of-the-art methods by a large margin on the KITTI benchmark.
arXiv Detail & Related papers (2022-07-26T15:48:46Z) - Learning Geometry-Guided Depth via Projective Modeling for Monocular 3D Object Detection [70.71934539556916]
We learn geometry-guided depth estimation with projective modeling to advance monocular 3D object detection.
Specifically, a principled geometry formula with projective modeling of 2D and 3D depth predictions in the monocular 3D object detection network is devised.
Our method remarkably improves the detection performance of the state-of-the-art monocular-based method without extra data by 2.80% on the moderate test setting.
arXiv Detail & Related papers (2021-07-29T12:30:39Z) - Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose
Estimation [18.103595280706593]
We leverage recent advances in reliable 2D pose estimation with CNN to estimate the 3D pose of people from depth images.
Our approach achieves very competitive results both in accuracy and speed on two public datasets.
arXiv Detail & Related papers (2020-11-10T10:08:13Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.