Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
- URL: http://arxiv.org/abs/2004.00329v1
- Date: Wed, 1 Apr 2020 10:37:39 GMT
- Title: Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation
- Authors: Matteo Fabbri, Fabio Lanzi, Simone Calderara, Stefano Alletto, Rita
Cucchiara
- Abstract summary: We present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images.
We propose a simple and effective compression method to drastically reduce the size of this representation.
Our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets.
- Score: 33.71628590745982
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper we present a novel approach for bottom-up multi-person 3D human
pose estimation from monocular RGB images. We propose to use high resolution
volumetric heatmaps to model joint locations, devising a simple and effective
compression method to drastically reduce the size of this representation. At
the core of the proposed method lies our Volumetric Heatmap Autoencoder, a
fully-convolutional network tasked with the compression of ground-truth
heatmaps into a dense intermediate representation. A second model, the Code
Predictor, is then trained to predict these codes, which can be decompressed at
test time to re-obtain the original representation. Our experimental evaluation
shows that our method performs favorably when compared to state of the art on
both multi-person and single-person 3D human pose estimation datasets and,
thanks to our novel compression strategy, can process full-HD images at the
constant runtime of 8 fps regardless of the number of subjects in the scene.
Code and models available at https://github.com/fabbrimatteo/LoCO .
Related papers
- No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images [100.80376573969045]
NoPoSplat is a feed-forward model capable of reconstructing 3D scenes parameterized by 3D Gaussians from multi-view images.
Our model achieves real-time 3D Gaussian reconstruction during inference.
This work makes significant advances in pose-free generalizable 3D reconstruction and demonstrates its applicability to real-world scenarios.
arXiv Detail & Related papers (2024-10-31T17:58:22Z) - FAMOUS: High-Fidelity Monocular 3D Human Digitization Using View Synthesis [51.193297565630886]
The challenge of accurately inferring texture remains, particularly in obscured areas such as the back of a person in frontal-view images.
This limitation in texture prediction largely stems from the scarcity of large-scale and diverse 3D datasets.
We propose leveraging extensive 2D fashion datasets to enhance both texture and shape prediction in 3D human digitization.
arXiv Detail & Related papers (2024-10-13T01:25:05Z) - CrowdRec: 3D Crowd Reconstruction from Single Color Images [17.662273473398592]
We exploit the crowd features and propose a crowd-constrained optimization to improve the common single-person method on crowd images.
With the optimization, we can obtain accurate body poses and shapes with reasonable absolute positions from a large-scale crowd image.
arXiv Detail & Related papers (2023-10-10T06:03:39Z) - $PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D
Reconstruction [97.06927852165464]
Reconstructing the 3D shape of an object from a single RGB image is a long-standing and highly challenging problem in computer vision.
We propose a novel method for single-image 3D reconstruction which generates a sparse point cloud via a conditional denoising diffusion process.
arXiv Detail & Related papers (2023-02-21T13:37:07Z) - Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D
Representations [29.756718435405983]
Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis.
Existing approaches, such as Neural Radiance Field (NeRF) and its variants, usually require dense input views.
We introduce a novel coordinate-based model, CoCo-INR, for implicit neural 3D representation.
arXiv Detail & Related papers (2022-10-20T11:13:50Z) - H3D-Net: Few-Shot High-Fidelity 3D Head Reconstruction [27.66008315400462]
Recent learning approaches that implicitly represent surface geometry have shown impressive results in the problem of multi-view 3D reconstruction.
We tackle these limitations for the specific problem of few-shot full 3D head reconstruction.
We learn a shape model of 3D heads from thousands of incomplete raw scans using implicit representations.
arXiv Detail & Related papers (2021-07-26T23:04:18Z) - 3D Scene Compression through Entropy Penalized Neural Representation
Functions [19.277502420759653]
novel visual media enable the viewer to explore a 3D scene from arbitrary viewpoints, by interpolating between a discrete set of original views.
These types of applications require much larger amounts of storage space, which we seek to reduce.
Existing approaches for compressing 3D scenes are based on a separation of compression and rendering.
We unify these steps by directly compressing an implicit representation of the scene, a function that maps spatial coordinates to a radiance vector field, which can then be queried to render arbitrary viewpoints.
Our method significantly outperforms a state-of-the-art conventional approach for scene compression, achieving simultaneously higher quality reconstruction
arXiv Detail & Related papers (2021-04-26T10:36:47Z) - Monocular, One-stage, Regression of Multiple 3D People [105.3143785498094]
We propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP)
Our method simultaneously predicts a Body Center heatmap and a Mesh map, which can jointly describe the 3D body mesh on the pixel level.
Compared with state-of-the-art methods, ROMP superior performance on the challenging multi-person benchmarks.
arXiv Detail & Related papers (2020-08-27T17:21:47Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation [16.463390330757132]
We propose metric-scale truncation-robust volumetric heatmaps, whose dimensions are defined in metric 3D space near the subject.
We train a fully-convolutional network to estimate such heatmaps from monocular RGB in an end-to-end manner.
As our method is simple and fast, it can become a useful component for real-time top-down multi-person pose estimation systems.
arXiv Detail & Related papers (2020-03-05T22:38:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.