3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels
- URL: http://arxiv.org/abs/2003.08162v1
- Date: Wed, 18 Mar 2020 11:35:11 GMT
- Title: 3D Crowd Counting via Multi-View Fusion with 3D Gaussian Kernels
- Authors: Qi Zhang and Antoni B. Chan
- Abstract summary: We propose an end-to-end multi-view crowd counting method called multi-view multi-scale (MVMS)
Unlike MVMS, we propose to solve the crowd counting task through 3D feature fusion with 3D scene-level density maps, instead of the 2D ground-plane ones.
The proposed method is tested on 3 multi-view counting datasets and achieves better or comparable counting performance to the state-of-the-art.
- Score: 56.964614522968226
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowd counting has been studied for decades and a lot of works have achieved
good performance, especially the DNNs-based density map estimation methods.
Most existing crowd counting works focus on single-view counting, while few
works have studied multi-view counting for large and wide scenes, where
multiple cameras are used. Recently, an end-to-end multi-view crowd counting
method called multi-view multi-scale (MVMS) has been proposed, which fuses
multiple camera views using a CNN to predict a 2D scene-level density map on
the ground-plane. Unlike MVMS, we propose to solve the multi-view crowd
counting task through 3D feature fusion with 3D scene-level density maps,
instead of the 2D ground-plane ones. Compared to 2D fusion, the 3D fusion
extracts more information of the people along z-dimension (height), which helps
to solve the scale variations across multiple views. The 3D density maps still
preserve the 2D density maps property that the sum is the count, while also
providing 3D information about the crowd density. We also explore the
projection consistency among the 3D prediction and the ground-truth in the 2D
views to further enhance the counting performance. The proposed method is
tested on 3 multi-view counting datasets and achieves better or comparable
counting performance to the state-of-the-art.
Related papers
- MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Regulating Intermediate 3D Features for Vision-Centric Autonomous
Driving [26.03800936700545]
We propose to regulate intermediate dense 3D features with the help of volume rendering.
Experimental results on the Occ3D and nuScenes datasets demonstrate that Vampire facilitates fine-grained and appropriate extraction of dense 3D features.
arXiv Detail & Related papers (2023-12-19T04:09:05Z) - SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving [98.74706005223685]
3D scene understanding plays a vital role in vision-based autonomous driving.
We propose a SurroundOcc method to predict the 3D occupancy with multi-camera images.
arXiv Detail & Related papers (2023-03-16T17:59:08Z) - MVTN: Learning Multi-View Transformations for 3D Understanding [60.15214023270087]
We introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition.
MVTN can be trained end-to-end with any multi-view network for 3D shape recognition.
Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks.
arXiv Detail & Related papers (2022-12-27T12:09:16Z) - Multi-View Transformer for 3D Visual Grounding [64.30493173825234]
We propose a Multi-View Transformer (MVT) for 3D visual grounding.
We project the 3D scene to a multi-view space, in which the position information of the 3D scene under different views are modeled simultaneously and aggregated together.
arXiv Detail & Related papers (2022-04-05T12:59:43Z) - Wide-Area Crowd Counting: Multi-View Fusion Networks for Counting in
Large Scenes [50.744452135300115]
We propose a deep neural network framework for multi-view crowd counting.
Our methods achieve state-of-the-art results compared to other multi-view counting baselines.
arXiv Detail & Related papers (2020-12-02T03:20:30Z) - Virtual Multi-view Fusion for 3D Semantic Segmentation [11.259694096475766]
We show that our virtual views enable more effective training of 2D semantic segmentation networks than previous multiview approaches.
When the 2D per pixel predictions are aggregated on 3D surfaces, our virtual multiview fusion method is able to achieve significantly better 3D semantic segmentation results.
arXiv Detail & Related papers (2020-07-26T14:46:55Z) - Light3DPose: Real-time Multi-Person 3D PoseEstimation from Multiple
Views [5.510992382274774]
We present an approach to perform 3D pose estimation of multiple people from a few calibrated camera views.
Our architecture aggregates feature-maps from a 2D pose estimator backbone into a comprehensive representation of the 3D scene.
The proposed method is inherently efficient: as a pure bottom-up approach, it is computationally independent of the number of people in the scene.
arXiv Detail & Related papers (2020-04-06T14:12:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.