Related papers: Crowd3D: Towards Hundreds of People Reconstruction from a Single Image

Crowd3D: Towards Hundreds of People Reconstruction from a Single Image

URL: http://arxiv.org/abs/2301.09376v2
Date: Sat, 1 Apr 2023 14:17:26 GMT
Title: Crowd3D: Towards Hundreds of People Reconstruction from a Single Image
Authors: Hao Wen, Jing Huang, Huili Cui, Haozhe Lin, YuKun Lai, Lu Fang and Kun Li
Abstract summary: We propose Crowd3D, the first framework to reconstruct the 3D poses, shapes and locations of hundreds of people with global consistency from a single large-scene image. To deal with a large number of persons and various human sizes, we also design an adaptive human-centric cropping scheme.
Score: 57.58149031283827
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Image-based multi-person reconstruction in wide-field large scenes is critical for crowd analysis and security alert. However, existing methods cannot deal with large scenes containing hundreds of people, which encounter the challenges of large number of people, large variations in human scale, and complex spatial distribution. In this paper, we propose Crowd3D, the first framework to reconstruct the 3D poses, shapes and locations of hundreds of people with global consistency from a single large-scene image. The core of our approach is to convert the problem of complex crowd localization into pixel localization with the help of our newly defined concept, Human-scene Virtual Interaction Point (HVIP). To reconstruct the crowd with global consistency, we propose a progressive reconstruction network based on HVIP by pre-estimating a scene-level camera and a ground plane. To deal with a large number of persons and various human sizes, we also design an adaptive human-centric cropping scheme. Besides, we contribute a benchmark dataset, LargeCrowd, for crowd reconstruction in a large scene. Experimental results demonstrate the effectiveness of the proposed method. The code and datasets will be made public.

Related papers

FRESA: Feedforward Reconstruction of Personalized Skinned Avatars from Few Images [74.86864398919467]
We present a novel method for reconstructing personalized 3D human avatars with realistic animation from only a few images. We learn a universal prior from over a thousand clothed humans to achieve instant feedforward generation and zero-shot generalization. Our method generates more authentic reconstruction and animation than state-of-the-arts, and can be directly generalized to inputs from casually taken phone photos.
arXiv Detail & Related papers (2025-03-24T23:20:47Z)
MEAT: Multiview Diffusion Model for Human Generation on Megapixels with Mesh Attention [83.56588173102594]
We introduce a solution called mesh attention to enable training at 1024x1024 resolution. This approach significantly reduces the complexity of multiview attention while maintaining cross-view consistency. Building on this foundation, we devise a mesh attention block and combine it with keypoint conditioning to create our human-specific multiview diffusion model, MEAT.
arXiv Detail & Related papers (2025-03-11T17:50:59Z)
Reconstructing People, Places, and Cameras [57.81696692335401]
"Humans and Structure from Motion" (HSfM) is a method for jointly reconstructing multiple human meshes, scene point clouds, and camera parameters in a metric world coordinate system. Our results show that incorporating human data into the SfM pipeline improves camera pose estimation.
arXiv Detail & Related papers (2024-12-23T18:58:34Z)
Crowd3D++: Robust Monocular Crowd Reconstruction with Upright Space [55.77397543011443]
This paper aims to reconstruct hundreds of people's 3D poses, shapes, and locations from a single image with unknown camera parameters. Crowd3D is proposed to convert the complex 3D human localization into 2D-pixel localization with robust camera and ground estimation. Crowd3D++ eliminates the influence of camera parameters and the cropping operation by the proposed canonical upright space and ground-aware normalization transform.
arXiv Detail & Related papers (2024-11-09T16:49:59Z)
Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery [51.73680703579997]
We present a neural radiance field method for urban-scale semantic and building-level instance segmentation from aerial images. objects in urban aerial images exhibit substantial variations in size, including buildings, cars, and roads. We introduce a scale-adaptive semantic label fusion strategy that enhances the segmentation of objects of varying sizes. We then introduce a novel cross-view instance label grouping strategy to mitigate the multi-view inconsistency problem in the 2D instance labels.
arXiv Detail & Related papers (2024-03-18T14:15:39Z)
CrowdRec: 3D Crowd Reconstruction from Single Color Images [17.662273473398592]
We exploit the crowd features and propose a crowd-constrained optimization to improve the common single-person method on crowd images. With the optimization, we can obtain accurate body poses and shapes with reasonable absolute positions from a large-scale crowd image.
arXiv Detail & Related papers (2023-10-10T06:03:39Z)
SHERF: Generalizable Human NeRF from a Single Image [59.10589479808622]
SHERF is the first generalizable Human NeRF model for recovering animatable 3D humans from a single input image. We propose a bank of 3D-aware hierarchical features, including global, point-level, and pixel-aligned features, to facilitate informative encoding.
arXiv Detail & Related papers (2023-03-22T17:59:12Z)
Scene-Aware 3D Multi-Human Motion Capture from a Single Camera [83.06768487435818]
We consider the problem of estimating the 3D position of multiple humans in a scene as well as their body shape and articulation from a single RGB video recorded with a static camera. We leverage recent advances in computer vision using large-scale pre-trained models for a variety of modalities, including 2D body joints, joint angles, normalized disparity maps, and human segmentation masks. In particular, we estimate the scene depth and unique person scale from normalized disparity predictions using the 2D body joints and joint angles.
arXiv Detail & Related papers (2023-01-12T18:01:28Z)
MVP-Human Dataset for 3D Human Avatar Reconstruction from Unconstrained Frames [59.37430649840777]
We present 3D Avatar Reconstruction in the wild (ARwild), which first reconstructs the implicit skinning fields in a multi-level manner. We contribute a large-scale dataset, MVP-Human, which contains 400 subjects, each of which has 15 scans in different poses. Overall, benefits from the specific network architecture and the diverse data, the trained model enables 3D avatar reconstruction from unconstrained frames.
arXiv Detail & Related papers (2022-04-24T03:57:59Z)
Body Size and Depth Disambiguation in Multi-Person Reconstruction from Single Images [44.96633481495911]
We address the problem of multi-person 3D body pose and shape estimation from a single image. We devise a novel optimization scheme that learns the appropriate body scale and relative camera pose, by enforcing the feet of all people to remain on the ground floor. A thorough evaluation on MuPoTS-3D and 3DPW datasets demonstrates that our approach is able to robustly estimate the body translation and shape of multiple people while retrieving their spatial arrangement.
arXiv Detail & Related papers (2021-11-02T20:42:41Z)
Multi-person Implicit Reconstruction from a Single Image [37.6877421030774]
We present a new end-to-end learning framework to obtain detailed and spatially coherent reconstructions of multiple people from a single image. Existing multi-person methods suffer from two main drawbacks: they are often model-based and cannot capture accurate 3D models of people with loose clothing and hair.
arXiv Detail & Related papers (2021-04-19T13:21:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.