Multiview Detection with Cardboard Human Modeling
- URL: http://arxiv.org/abs/2207.02013v1
- Date: Tue, 5 Jul 2022 12:47:26 GMT
- Title: Multiview Detection with Cardboard Human Modeling
- Authors: Jiahao Ma, Zicheng Duan, Yunzhong Hou, Liang Zheng, Chuong Nguyen
- Abstract summary: We propose a new pedestrian representation scheme based on human point clouds modeling.
Specifically, using ray tracing for holistic human depth estimation, we model pedestrians as upright, thin cardboard point clouds on the ground.
- Score: 23.072791405965415
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multiview detection uses multiple calibrated cameras with overlapping fields
of views to locate occluded pedestrians. In this field, existing methods
typically adopt a ``human modeling - aggregation'' strategy. To find robust
pedestrian representations, some intuitively use locations of detected 2D
bounding boxes, while others use entire frame features projected to the ground
plane. However, the former does not consider human appearance and leads to many
ambiguities, and the latter suffers from projection errors due to the lack of
accurate height of the human torso and head. In this paper, we propose a new
pedestrian representation scheme based on human point clouds modeling.
Specifically, using ray tracing for holistic human depth estimation, we model
pedestrians as upright, thin cardboard point clouds on the ground. Then, we
aggregate the point clouds of the pedestrian cardboard across multiple views
for a final decision. Compared with existing representations, the proposed
method explicitly leverages human appearance and reduces projection errors
significantly by relatively accurate height estimation. On two standard
evaluation benchmarks, the proposed method achieves very competitive results.
Related papers
- Zolly: Zoom Focal Length Correctly for Perspective-Distorted Human Mesh
Reconstruction [66.10717041384625]
Zolly is the first 3DHMR method focusing on perspective-distorted images.
We propose a new camera model and a novel 2D representation, termed distortion image, which describes the 2D dense distortion scale of the human body.
We extend two real-world datasets tailored for this task, all containing perspective-distorted human images.
arXiv Detail & Related papers (2023-03-24T04:22:41Z) - 3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera
Pedestrian Localization [6.929027496437192]
The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.
arXiv Detail & Related papers (2022-07-22T06:15:20Z) - Dual networks based 3D Multi-Person Pose Estimation from Monocular Video [42.01876518017639]
Multi-person 3D pose estimation is more challenging than single pose estimation.
Existing top-down and bottom-up approaches to pose estimation suffer from detection errors.
We propose the integration of top-down and bottom-up approaches to exploit their strengths.
arXiv Detail & Related papers (2022-05-02T08:53:38Z) - Generalizable Multi-Camera 3D Pedestrian Detection [1.8303072203996347]
We present a multi-camera 3D pedestrian detection method that does not need to train using data from the target scene.
We estimate pedestrian location on the ground plane using a novel based on human body poses and person's bounding boxes from an off-the-shelf monocular detector.
We then project these locations onto the world ground plane and fuse them with a new formulation of a clique cover problem.
arXiv Detail & Related papers (2021-04-12T20:58:25Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Wide-Depth-Range 6D Object Pose Estimation in Space [124.94794113264194]
6D pose estimation in space poses unique challenges that are not commonly encountered in the terrestrial setting.
One of the most striking differences is the lack of atmospheric scattering, allowing objects to be visible from a great distance.
We propose a single-stage hierarchical end-to-end trainable network that is more robust to scale variations.
arXiv Detail & Related papers (2021-04-01T08:39:26Z) - Synthetic Training for Monocular Human Mesh Recovery [100.38109761268639]
This paper aims to estimate 3D mesh of multiple body parts with large-scale differences from a single RGB image.
The main challenge is lacking training data that have complete 3D annotations of all body parts in 2D images.
We propose a depth-to-scale (D2S) projection to incorporate the depth difference into the projection function to derive per-joint scale variants.
arXiv Detail & Related papers (2020-10-27T03:31:35Z) - Error Bounds of Projection Models in Weakly Supervised 3D Human Pose
Estimation [27.289210415215067]
We present a detailed analysis of the most commonly used simplified projection models.
We show how the normalized perspective projection can be replaced to avoid this guaranteed minimal error.
Our results show that both projection models lead to an inherent minimal error between 19.3mm and 54.7mm, even after alignment in position and scale.
arXiv Detail & Related papers (2020-10-23T11:48:13Z) - Multi-person 3D Pose Estimation in Crowded Scenes Based on Multi-View
Geometry [62.29762409558553]
Epipolar constraints are at the core of feature matching and depth estimation in multi-person 3D human pose estimation methods.
Despite the satisfactory performance of this formulation in sparser crowd scenes, its effectiveness is frequently challenged under denser crowd circumstances.
In this paper, we depart from the multi-person 3D pose estimation formulation, and instead reformulate it as crowd pose estimation.
arXiv Detail & Related papers (2020-07-21T17:59:36Z) - Coherent Reconstruction of Multiple Humans from a Single Image [68.3319089392548]
In this work, we address the problem of multi-person 3D pose estimation from a single image.
A typical regression approach in the top-down setting of this problem would first detect all humans and then reconstruct each one of them independently.
Our goal is to train a single network that learns to avoid these problems and generate a coherent 3D reconstruction of all the humans in the scene.
arXiv Detail & Related papers (2020-06-15T17:51:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.