3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera
Pedestrian Localization
- URL: http://arxiv.org/abs/2207.10895v2
- Date: Mon, 25 Jul 2022 17:27:35 GMT
- Title: 3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera
Pedestrian Localization
- Authors: Rui Qiu, Ming Xu, Yuyao Yan, Jeremy S. Smith and Xi Yang
- Abstract summary: The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.
- Score: 6.929027496437192
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Although deep-learning based methods for monocular pedestrian detection have
made great progress, they are still vulnerable to heavy occlusions. Using
multi-view information fusion is a potential solution but has limited
applications, due to the lack of annotated training samples in existing
multi-view datasets, which increases the risk of overfitting. To address this
problem, a data augmentation method is proposed to randomly generate 3D
cylinder occlusions, on the ground plane, which are of the average size of
pedestrians and projected to multiple views, to relieve the impact of
overfitting in the training. Moreover, the feature map of each view is
projected to multiple parallel planes at different heights, by using
homographies, which allows the CNNs to fully utilize the features across the
height of each pedestrian to infer the locations of pedestrians on the ground
plane. The proposed 3DROM method has a greatly improved performance in
comparison with the state-of-the-art deep-learning based methods for multi-view
pedestrian detection.
Related papers
- MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - CrossDTR: Cross-view and Depth-guided Transformers for 3D Object
Detection [10.696619570924778]
We propose Cross-view and Depth-guided Transformers for 3D Object Detection, CrossDTR.
Our method hugely surpassed existing multi-camera methods by 10 percent in pedestrian detection and about 3 percent in overall mAP and NDS metrics.
arXiv Detail & Related papers (2022-09-27T16:23:12Z) - Scatter Points in Space: 3D Detection from Multi-view Monocular Images [8.71944437852952]
3D object detection from monocular image(s) is a challenging and long-standing problem of computer vision.
Recent methods tend to aggregate multiview feature by sampling regular 3D grid densely in space.
We propose a learnable keypoints sampling method, which scatters pseudo surface points in 3D space, in order to keep data sparsity.
arXiv Detail & Related papers (2022-08-31T09:38:05Z) - Efficient View Clustering and Selection for City-Scale 3D Reconstruction [1.1011268090482573]
A novel approach for scaling Multi-View Stereo (MVS) algorithms up to arbitrarily large collections of images is proposed.
The presented method exploits an approximately uniform distribution of poses and geometry and builds a set of overlapping clusters.
Since clustering is independent from pairwise visibility information, the proposed algorithm runs faster than existing literature and allows for a massive parallelization.
arXiv Detail & Related papers (2022-07-18T08:33:52Z) - STCrowd: A Multimodal Dataset for Pedestrian Perception in Crowded
Scenes [78.95447086305381]
Accurately detecting and tracking pedestrians in 3D space is challenging due to large variations in rotations, poses and scales.
Existing benchmarks either only provide 2D annotations, or have limited 3D annotations with low-density pedestrian distribution.
We introduce a large-scale multimodal dataset, STCrowd, to better evaluate pedestrian perception algorithms in crowded scenarios.
arXiv Detail & Related papers (2022-04-03T08:26:07Z) - On Triangulation as a Form of Self-Supervision for 3D Human Pose
Estimation [57.766049538913926]
Supervised approaches to 3D pose estimation from single images are remarkably effective when labeled data is abundant.
Much of the recent attention has shifted towards semi and (or) weakly supervised learning.
We propose to impose multi-view geometrical constraints by means of a differentiable triangulation and to use it as form of self-supervision during training when no labels are available.
arXiv Detail & Related papers (2022-03-29T19:11:54Z) - Multi-View Multi-Person 3D Pose Estimation with Plane Sweep Stereo [71.59494156155309]
Existing approaches for multi-view 3D pose estimation explicitly establish cross-view correspondences to group 2D pose detections from multiple camera views.
We present our multi-view 3D pose estimation approach based on plane sweep stereo to jointly address the cross-view fusion and 3D pose reconstruction in a single shot.
arXiv Detail & Related papers (2021-04-06T03:49:35Z) - Lightweight Multi-View 3D Pose Estimation through Camera-Disentangled
Representation [57.11299763566534]
We present a solution to recover 3D pose from multi-view images captured with spatially calibrated cameras.
We exploit 3D geometry to fuse input images into a unified latent representation of pose, which is disentangled from camera view-points.
Our architecture then conditions the learned representation on camera projection operators to produce accurate per-view 2d detections.
arXiv Detail & Related papers (2020-04-05T12:52:29Z) - 3D Crowd Counting via Geometric Attention-guided Multi-View Fusion [50.520192402702015]
We propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps.
Compared to 2D fusion, the 3D fusion extracts more information of the people along the z-dimension (height), which helps to address the scale variations across multiple views.
The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density.
arXiv Detail & Related papers (2020-03-18T11:35:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.