CrowdRec: 3D Crowd Reconstruction from Single Color Images
- URL: http://arxiv.org/abs/2310.06332v1
- Date: Tue, 10 Oct 2023 06:03:39 GMT
- Title: CrowdRec: 3D Crowd Reconstruction from Single Color Images
- Authors: Buzhen Huang, Jingyi Ju, Yangang Wang
- Abstract summary: We exploit the crowd features and propose a crowd-constrained optimization to improve the common single-person method on crowd images.
With the optimization, we can obtain accurate body poses and shapes with reasonable absolute positions from a large-scale crowd image.
- Score: 17.662273473398592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This is a technical report for the GigaCrowd challenge. Reconstructing 3D
crowds from monocular images is a challenging problem due to mutual occlusions,
server depth ambiguity, and complex spatial distribution. Since no large-scale
3D crowd dataset can be used to train a robust model, the current multi-person
mesh recovery methods can hardly achieve satisfactory performance in crowded
scenes. In this paper, we exploit the crowd features and propose a
crowd-constrained optimization to improve the common single-person method on
crowd images. To avoid scale variations, we first detect human bounding-boxes
and 2D poses from the original images with off-the-shelf detectors. Then, we
train a single-person mesh recovery network using existing in-the-wild image
datasets. To promote a more reasonable spatial distribution, we further propose
a crowd constraint to refine the single-person network parameters. With the
optimization, we can obtain accurate body poses and shapes with reasonable
absolute positions from a large-scale crowd image using a single-person
backbone. The code will be publicly available
at~\url{https://github.com/boycehbz/CrowdRec}.
Related papers
- Crowd3D++: Robust Monocular Crowd Reconstruction with Upright Space [55.77397543011443]
This paper aims to reconstruct hundreds of people's 3D poses, shapes, and locations from a single image with unknown camera parameters.
Crowd3D is proposed to convert the complex 3D human localization into 2D-pixel localization with robust camera and ground estimation.
Crowd3D++ eliminates the influence of camera parameters and the cropping operation by the proposed canonical upright space and ground-aware normalization transform.
arXiv Detail & Related papers (2024-11-09T16:49:59Z) - Crowd3D: Towards Hundreds of People Reconstruction from a Single Image [57.58149031283827]
We propose Crowd3D, the first framework to reconstruct the 3D poses, shapes and locations of hundreds of people with global consistency from a single large-scene image.
To deal with a large number of persons and various human sizes, we also design an adaptive human-centric cropping scheme.
arXiv Detail & Related papers (2023-01-23T11:45:27Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - KeypointNeRF: Generalizing Image-based Volumetric Avatars using Relative
Spatial Encoding of Keypoints [28.234772596912165]
We propose a highly effective approach to modeling high-fidelity volumetric avatars from sparse views.
One of the key ideas is to encode relative spatial 3D information via sparse 3D keypoints.
Our experiments show that a majority of errors in prior work stem from an inappropriate choice of spatial encoding.
arXiv Detail & Related papers (2022-05-10T15:57:03Z) - NeuralReshaper: Single-image Human-body Retouching with Deep Neural
Networks [50.40798258968408]
We present NeuralReshaper, a novel method for semantic reshaping of human bodies in single images using deep generative networks.
Our approach follows a fit-then-reshape pipeline, which first fits a parametric 3D human model to a source human image.
To deal with the lack-of-data problem that no paired data exist, we introduce a novel self-supervised strategy to train our network.
arXiv Detail & Related papers (2022-03-20T09:02:13Z) - Monocular, One-stage, Regression of Multiple 3D People [105.3143785498094]
We propose to Regress all meshes in a One-stage fashion for Multiple 3D People (termed ROMP)
Our method simultaneously predicts a Body Center heatmap and a Mesh map, which can jointly describe the 3D body mesh on the pixel level.
Compared with state-of-the-art methods, ROMP superior performance on the challenging multi-person benchmarks.
arXiv Detail & Related papers (2020-08-27T17:21:47Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z) - Compressed Volumetric Heatmaps for Multi-Person 3D Pose Estimation [33.71628590745982]
We present a novel approach for bottom-up multi-person 3D human pose estimation from monocular RGB images.
We propose a simple and effective compression method to drastically reduce the size of this representation.
Our method performs favorably when compared to state of the art on both multi-person and single-person 3D human pose estimation datasets.
arXiv Detail & Related papers (2020-04-01T10:37:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.