Wide-Area Crowd Counting: Multi-View Fusion Networks for Counting in
Large Scenes
- URL: http://arxiv.org/abs/2012.00946v1
- Date: Wed, 2 Dec 2020 03:20:30 GMT
- Title: Wide-Area Crowd Counting: Multi-View Fusion Networks for Counting in
Large Scenes
- Authors: Qi Zhang, Antoni B. Chan
- Abstract summary: We propose a deep neural network framework for multi-view crowd counting.
Our methods achieve state-of-the-art results compared to other multi-view counting baselines.
- Score: 50.744452135300115
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Crowd counting in single-view images has achieved outstanding performance on
existing counting datasets. However, single-view counting is not applicable to
large and wide scenes (e.g., public parks, long subway platforms, or event
spaces) because a single camera cannot capture the whole scene in adequate
detail for counting, e.g., when the scene is too large to fit into the
field-of-view of the camera, too long so that the resolution is too low on
faraway crowds, or when there are too many large objects that occlude large
portions of the crowd. Therefore, to solve the wide-area counting task requires
multiple cameras with overlapping fields-of-view. In this paper, we propose a
deep neural network framework for multi-view crowd counting, which fuses
information from multiple camera views to predict a scene-level density map on
the ground-plane of the 3D world. We consider three versions of the fusion
framework: the late fusion model fuses camera-view density map; the naive early
fusion model fuses camera-view feature maps; and the multi-view multi-scale
early fusion model ensures that features aligned to the same ground-plane point
have consistent scales. A rotation selection module further ensures consistent
rotation alignment of the features. We test our 3 fusion models on 3 multi-view
counting datasets, PETS2009, DukeMTMC, and a newly collected multi-view
counting dataset containing a crowded street intersection. Our methods achieve
state-of-the-art results compared to other multi-view counting baselines.
Related papers
- NWPU-MOC: A Benchmark for Fine-grained Multi-category Object Counting in
Aerial Images [64.92809155168595]
This paper introduces a Multi-category Object Counting task to estimate the numbers of different objects in an aerial image.
Considering the absence of a dataset for this task, a large-scale dataset is collected, consisting of 3,416 scenes with a resolution of 1024 $times$ 1024 pixels.
The paper presents a multi-spectrum, multi-category object counting framework, which employs a dual-attention module to fuse the features of RGB and NIR.
arXiv Detail & Related papers (2024-01-19T07:12:36Z) - ConsistNet: Enforcing 3D Consistency for Multi-view Images Diffusion [61.37481051263816]
Given a single image of a 3D object, this paper proposes a method (named ConsistNet) that is able to generate multiple images of the same object.
Our method effectively learns 3D consistency over a frozen Zero123 backbone and can generate 16 surrounding views of the object within 40 seconds on a single A100 GPU.
arXiv Detail & Related papers (2023-10-16T12:29:29Z) - DrivingDiffusion: Layout-Guided multi-view driving scene video
generation with latent diffusion model [19.288610627281102]
We propose DrivingDiffusion to generate realistic multi-view videos controlled by 3D layout.
Our model can generate large-scale realistic multi-camera driving videos in complex urban scenes.
arXiv Detail & Related papers (2023-10-11T18:00:08Z) - 3M3D: Multi-view, Multi-path, Multi-representation for 3D Object
Detection [0.5156484100374059]
We propose 3M3D: A Multi-view, Multi-path, Multi-representation for 3D Object Detection.
We update both multi-view features and query features to enhance the representation of the scene in both fine panoramic view and coarse global view.
We show performance improvements on nuScenes benchmark dataset on top of our baselines.
arXiv Detail & Related papers (2023-02-16T11:28:30Z) - Cross-View Cross-Scene Multi-View Crowd Counting [56.83882084112913]
Multi-view crowd counting has been previously proposed to utilize multi-cameras to extend the field-of-view of a single camera.
We propose a cross-view cross-scene (CVCS) multi-view crowd counting paradigm, where the training and testing occur on different scenes with arbitrary camera layouts.
arXiv Detail & Related papers (2022-05-03T15:03:44Z) - Multiview Detection with Feature Perspective Transformation [59.34619548026885]
We propose a novel multiview detection system, MVDet.
We take an anchor-free approach to aggregate multiview information by projecting feature maps onto the ground plane.
Our entire model is end-to-end learnable and achieves 88.2% MODA on the standard Wildtrack dataset.
arXiv Detail & Related papers (2020-07-14T17:58:30Z) - 3D Crowd Counting via Geometric Attention-guided Multi-View Fusion [50.520192402702015]
We propose to solve the multi-view crowd counting task through 3D feature fusion with 3D scene-level density maps.
Compared to 2D fusion, the 3D fusion extracts more information of the people along the z-dimension (height), which helps to address the scale variations across multiple views.
The 3D density maps still preserve the 2D density maps property that the sum is the count, while also providing 3D information about the crowd density.
arXiv Detail & Related papers (2020-03-18T11:35:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.