Crowd3D: Towards Hundreds of People Reconstruction from a Single Image
- URL: http://arxiv.org/abs/2301.09376v2
- Date: Sat, 1 Apr 2023 14:17:26 GMT
- Title: Crowd3D: Towards Hundreds of People Reconstruction from a Single Image
- Authors: Hao Wen, Jing Huang, Huili Cui, Haozhe Lin, YuKun Lai, Lu Fang and Kun
Li
- Abstract summary: We propose Crowd3D, the first framework to reconstruct the 3D poses, shapes and locations of hundreds of people with global consistency from a single large-scene image.
To deal with a large number of persons and various human sizes, we also design an adaptive human-centric cropping scheme.
- Score: 57.58149031283827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image-based multi-person reconstruction in wide-field large scenes is
critical for crowd analysis and security alert. However, existing methods
cannot deal with large scenes containing hundreds of people, which encounter
the challenges of large number of people, large variations in human scale, and
complex spatial distribution. In this paper, we propose Crowd3D, the first
framework to reconstruct the 3D poses, shapes and locations of hundreds of
people with global consistency from a single large-scene image. The core of our
approach is to convert the problem of complex crowd localization into pixel
localization with the help of our newly defined concept, Human-scene Virtual
Interaction Point (HVIP). To reconstruct the crowd with global consistency, we
propose a progressive reconstruction network based on HVIP by pre-estimating a
scene-level camera and a ground plane. To deal with a large number of persons
and various human sizes, we also design an adaptive human-centric cropping
scheme. Besides, we contribute a benchmark dataset, LargeCrowd, for crowd
reconstruction in a large scene. Experimental results demonstrate the
effectiveness of the proposed method. The code and datasets will be made
public.
Related papers
- Crowd3D++: Robust Monocular Crowd Reconstruction with Upright Space [55.77397543011443]
This paper aims to reconstruct hundreds of people's 3D poses, shapes, and locations from a single image with unknown camera parameters.
Crowd3D is proposed to convert the complex 3D human localization into 2D-pixel localization with robust camera and ground estimation.
Crowd3D++ eliminates the influence of camera parameters and the cropping operation by the proposed canonical upright space and ground-aware normalization transform.
arXiv Detail & Related papers (2024-11-09T16:49:59Z) - Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery [51.73680703579997]
We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images.
objects in urban aerial images exhibit substantial variations in size, including buildings, cars, and roads.
We introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes.
We then introduce a novel cross-view instance label grouping strategy to mitigate the multi-view inconsistency problem in the 2D instance labels.
arXiv Detail & Related papers (2024-03-18T14:15:39Z) - CrowdRec: 3D Crowd Reconstruction from Single Color Images [17.662273473398592]
We exploit the crowd features and propose a crowd-constrained optimization to improve the common single-person method on crowd images.
With the optimization, we can obtain accurate body poses and shapes with reasonable absolute positions from a large-scale crowd image.
arXiv Detail & Related papers (2023-10-10T06:03:39Z) - SHERF: Generalizable Human NeRF from a Single Image [59.10589479808622]
SHERF is the first generalizable Human NeRF model for recovering animatable 3D humans from a single input image.
We propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.
arXiv Detail & Related papers (2023-03-22T17:59:12Z) - Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera.
We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks.
In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z) - MVP-Human Dataset for 3D Human Avatar Reconstruction from Unconstrained
Frames [59.37430649840777]
We present 3D Avatar Reconstruction in the wild (ARwild), which first reconstructs the implicit skinning fields in a multi-level manner.
We contribute a large-scale dataset, MVP-Human, which contains 400 subjects, each of which has 15 scans in different poses.
Overall, benefits from the specific network architecture and the diverse data, the trained model enables 3D avatar reconstruction from unconstrained frames.
arXiv Detail & Related papers (2022-04-24T03:57:59Z) - Body Size and Depth Disambiguation in Multi-Person Reconstruction from
Single Images [44.96633481495911]
We address the problem of multi-person 3D body pose and shape estimation from a single image.
We devise a novel optimization scheme that learns the appropriate body scale and relative camera pose, by enforcing the feet of all people to remain on the ground floor.
A thorough evaluation on MuPoTS-3D and 3DPW datasets demonstrates that our approach is able to robustly estimate the body translation and shape of multiple people while retrieving their spatial arrangement.
arXiv Detail & Related papers (2021-11-02T20:42:41Z) - Multi-person Implicit Reconstruction from a Single Image [37.6877421030774]
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image.
Existing multi-person methods suffer from two main drawbacks: they are often model-based and cannot capture accurate 3D models of people with loose clothing and hair.
arXiv Detail & Related papers (2021-04-19T13:21:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.