MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human
Localization
- URL: http://arxiv.org/abs/2008.10913v2
- Date: Mon, 22 Mar 2021 16:59:49 GMT
- Title: MonStereo: When Monocular and Stereo Meet at the Tail of 3D Human
Localization
- Authors: Lorenzo Bertoni, Sven Kreiss, Taylor Mordan, Alexandre Alahi
- Abstract summary: We propose a novel unified learning framework that leverages the strengths of both monocular and stereo cues for 3D human localization.
Our method associates humans in left-right images, (ii) deals with occluded and distant cases in stereo settings, and (iii) tackles the intrinsic ambiguity of monocular perspective projection.
- Score: 89.71926844164268
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular and stereo visions are cost-effective solutions for 3D human
localization in the context of self-driving cars or social robots. However,
they are usually developed independently and have their respective strengths
and limitations. We propose a novel unified learning framework that leverages
the strengths of both monocular and stereo cues for 3D human localization. Our
method jointly (i) associates humans in left-right images, (ii) deals with
occluded and distant cases in stereo settings by relying on the robustness of
monocular cues, and (iii) tackles the intrinsic ambiguity of monocular
perspective projection by exploiting prior knowledge of the human height
distribution. We specifically evaluate outliers as well as challenging
instances, such as occluded and far-away pedestrians, by analyzing the entire
error distribution and by estimating calibrated confidence intervals. Finally,
we critically review the official KITTI 3D metrics and propose a practical 3D
localization metric tailored for humans.
Related papers
- Multi-view Pose Fusion for Occlusion-Aware 3D Human Pose Estimation [3.442372522693843]
We present a novel approach for robust 3D human pose estimation in the context of human-robot collaboration.
Our approach outperforms state-of-the-art multi-view human pose estimation techniques.
arXiv Detail & Related papers (2024-08-28T14:10:57Z) - 3D Human Pose Perception from Egocentric Stereo Videos [67.9563319914377]
We propose a new transformer-based framework to improve egocentric stereo 3D human pose estimation.
Our method is able to accurately estimate human poses even in challenging scenarios, such as crouching and sitting.
We will release UnrealEgo2, UnrealEgo-RW, and trained models on our project page.
arXiv Detail & Related papers (2023-12-30T21:21:54Z) - Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views [9.476008200056082]
Ego3DPose is a highly accurate binocular egocentric 3D pose reconstruction system.
We propose a two-path network architecture with a path that estimates pose per limb independently with its binocular heatmaps.
We propose a new perspective-aware representation using trigonometry, enabling the network to estimate the 3D orientation of limbs.
arXiv Detail & Related papers (2023-09-21T10:34:35Z) - JOTR: 3D Joint Contrastive Learning with Transformers for Occluded Human
Mesh Recovery [84.67823511418334]
This paper presents 3D JOint contrastive learning with TRansformers framework for handling occluded 3D human mesh recovery.
Our method includes an encoder-decoder transformer architecture to fuse 2D and 3D representations for achieving 2D$&$3D aligned results.
arXiv Detail & Related papers (2023-07-31T02:58:58Z) - Non-Local Latent Relation Distillation for Self-Adaptive 3D Human Pose
Estimation [63.199549837604444]
3D human pose estimation approaches leverage different forms of strong (2D/3D pose) or weak (multi-view or depth) paired supervision.
We cast 3D pose learning as a self-supervised adaptation problem that aims to transfer the task knowledge from a labeled source domain to a completely unpaired target.
We evaluate different self-adaptation settings and demonstrate state-of-the-art 3D human pose estimation performance on standard benchmarks.
arXiv Detail & Related papers (2022-04-05T03:52:57Z) - THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers [67.8628917474705]
THUNDR is a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people.
We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models.
We observe very solid 3d reconstruction performance for difficult human poses collected in the wild.
arXiv Detail & Related papers (2021-06-17T09:09:24Z) - Monocular 3D Multi-Person Pose Estimation by Integrating Top-Down and
Bottom-Up Networks [33.974241749058585]
Multi-person pose estimation can cause human detection to be erroneous and human-joints grouping to be unreliable.
Existing top-down methods rely on human detection and thus suffer from these problems.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2021-04-05T07:05:21Z) - Perceiving Humans: from Monocular 3D Localization to Social Distancing [93.03056743850141]
We present a new cost-effective vision-based method that perceives humans' locations in 3D and their body orientation from a single image.
We show that it is possible to rethink the concept of "social distancing" as a form of social interaction in contrast to a simple location-based rule.
arXiv Detail & Related papers (2020-09-01T10:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.