Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation
- URL: http://arxiv.org/abs/2003.02953v1
- Date: Thu, 5 Mar 2020 22:38:13 GMT
- Title: Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation
- Authors: Istv\'an S\'ar\'andi and Timm Linder and Kai O. Arras and Bastian
Leibe
- Abstract summary: We propose metric-scale truncation-robust volumetric heatmaps, whose dimensions are defined in metric 3D space near the subject.
We train a fully-convolutional network to estimate such heatmaps from monocular RGB in an end-to-end manner.
As our method is simple and fast, it can become a useful component for real-time top-down multi-person pose estimation systems.
- Score: 16.463390330757132
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Heatmap representations have formed the basis of 2D human pose estimation
systems for many years, but their generalizations for 3D pose have only
recently been considered. This includes 2.5D volumetric heatmaps, whose X and Y
axes correspond to image space and the Z axis to metric depth around the
subject. To obtain metric-scale predictions, these methods must include a
separate, explicit post-processing step to resolve scale ambiguity. Further,
they cannot encode body joint positions outside of the image boundaries,
leading to incomplete pose estimates in case of image truncation. We address
these limitations by proposing metric-scale truncation-robust (MeTRo)
volumetric heatmaps, whose dimensions are defined in metric 3D space near the
subject, instead of being aligned with image space. We train a
fully-convolutional network to estimate such heatmaps from monocular RGB in an
end-to-end manner. This reinterpretation of the heatmap dimensions allows us to
estimate complete metric-scale poses without test-time knowledge of the focal
length or person distance and without relying on anthropometric heuristics in
post-processing. Furthermore, as the image space is decoupled from the heatmap
space, the network can learn to reason about joints beyond the image boundary.
Using ResNet-50 without any additional learned layers, we obtain
state-of-the-art results on the Human3.6M and MPI-INF-3DHP benchmarks. As our
method is simple and fast, it can become a useful component for real-time
top-down multi-person pose estimation systems. We make our code publicly
available to facilitate further research (see
https://vision.rwth-aachen.de/metro-pose3d).
Related papers
- Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences [21.057940424318314]
Given two images, we can estimate the relative camera pose between them by establishing image-to-image correspondences.
We present MicKey, a keypoint matching pipeline that is able to predict metric correspondences in 3D camera space.
arXiv Detail & Related papers (2024-04-09T14:22:50Z) - Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation [74.28509379811084]
Metric3D v2 is a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image.
We propose solutions for both metric depth estimation and surface normal estimation.
Our method enables the accurate recovery of metric 3D structures on randomly collected internet images.
arXiv Detail & Related papers (2024-03-22T02:30:46Z) - Semi-Perspective Decoupled Heatmaps for 3D Robot Pose Estimation from
Depth Maps [66.24554680709417]
Knowing the exact 3D location of workers and robots in a collaborative environment enables several real applications.
We propose a non-invasive framework based on depth devices and deep neural networks to estimate the 3D pose of robots from an external camera.
arXiv Detail & Related papers (2022-07-06T08:52:12Z) - Category-Level Metric Scale Object Shape and Pose Estimation [73.92460712829188]
We propose a framework that jointly estimates a metric scale shape and pose from a single RGB image.
We validated our method on both synthetic and real-world datasets to evaluate category-level object pose and shape.
arXiv Detail & Related papers (2021-09-01T12:16:46Z) - VoxelTrack: Multi-Person 3D Human Pose Estimation and Tracking in the
Wild [98.69191256693703]
We present VoxelTrack for multi-person 3D pose estimation and tracking from a few cameras which are separated by wide baselines.
It employs a multi-branch network to jointly estimate 3D poses and re-identification (Re-ID) features for all people in the environment.
It outperforms the state-of-the-art methods by a large margin on three public datasets including Shelf, Campus and CMU Panoptic.
arXiv Detail & Related papers (2021-08-05T08:35:44Z) - SMAP: Single-Shot Multi-Person Absolute 3D Pose Estimation [46.85865451812981]
We propose a novel system that first regresses a set of 2.5D representations of body parts and then reconstructs the 3D absolute poses based on these 2.5D representations with a depth-aware part association algorithm.
Such a single-shot bottom-up scheme allows the system to better learn and reason about the inter-person depth relationship, improving both 3D and 2D pose estimation.
arXiv Detail & Related papers (2020-08-26T09:56:07Z) - HDNet: Human Depth Estimation for Multi-Person Camera-Space Localization [83.57863764231655]
We propose the Human Depth Estimation Network (HDNet), an end-to-end framework for absolute root joint localization.
A skeleton-based Graph Neural Network (GNN) is utilized to propagate features among joints.
We evaluate our HDNet on the root joint localization and root-relative 3D pose estimation tasks with two benchmark datasets.
arXiv Detail & Related papers (2020-07-17T12:44:23Z) - MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human
Pose Estimation [16.463390330757132]
We propose metric-scale truncation-robust (MeTRo) volumetric heatmaps, whose dimensions are all defined in metric 3D space, instead of being aligned with image space.
This reinterpretation of heatmap dimensions allows us to directly estimate complete, metric-scale poses without test-time knowledge of distance or relying on anthropometrics, such as bone lengths.
We find that supervision via absolute pose loss is crucial for accurate non-root-relative localization.
arXiv Detail & Related papers (2020-07-12T11:52:09Z) - Fusing Wearable IMUs with Multi-View Images for Human Pose Estimation: A
Geometric Approach [76.10879433430466]
We propose to estimate 3D human pose from multi-view images and a few IMUs attached at person's limbs.
It operates by firstly detecting 2D poses from the two signals, and then lifting them to the 3D space.
The simple two-step approach reduces the error of the state-of-the-art by a large margin on a public dataset.
arXiv Detail & Related papers (2020-03-25T00:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.