Multi-Stream Networks and Ground-Truth Generation for Crowd Counting
- URL: http://arxiv.org/abs/2002.09951v3
- Date: Wed, 11 Mar 2020 20:47:00 GMT
- Title: Multi-Stream Networks and Ground-Truth Generation for Crowd Counting
- Authors: Rodolfo Quispe, Darwin Ttito, Ad\'in Ram\'irez Rivera, Helio Pedrini
- Abstract summary: A Multi-Stream Convolutional Neural Network is developed and evaluated in this work.
It receives an image as input and produces a density map that represents the spatial distribution of people in an end-to-end fashion.
In addition, we investigate the influence of the two most common fashions on the generation of ground truths and propose a hybrid method.
- Score: 0.5161531917413708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowd scene analysis has received a lot of attention recently due to the wide
variety of applications, for instance, forensic science, urban planning,
surveillance and security. In this context, a challenging task is known as
crowd counting, whose main purpose is to estimate the number of people present
in a single image. A Multi-Stream Convolutional Neural Network is developed and
evaluated in this work, which receives an image as input and produces a density
map that represents the spatial distribution of people in an end-to-end
fashion. In order to address complex crowd counting issues, such as extremely
unconstrained scale and perspective changes, the network architecture utilizes
receptive fields with different size filters for each stream. In addition, we
investigate the influence of the two most common fashions on the generation of
ground truths and propose a hybrid method based on tiny face detection and
scale interpolation. Experiments conducted on two challenging datasets,
UCF-CC-50 and ShanghaiTech, demonstrate that using our ground truth generation
methods achieves superior results.
Related papers
- Diffusion-based Data Augmentation for Object Counting Problems [62.63346162144445]
We develop a pipeline that utilizes a diffusion model to generate extensive training data.
We are the first to generate images conditioned on a location dot map with a diffusion model.
Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
arXiv Detail & Related papers (2024-01-25T07:28:22Z) - ZoomNeXt: A Unified Collaborative Pyramid Network for Camouflaged Object Detection [70.11264880907652]
Recent object (COD) attempts to segment objects visually blended into their surroundings, which is extremely complex and difficult in real-world scenarios.
We propose an effective unified collaborative pyramid network that mimics human behavior when observing vague images and camouflaged zooming in and out.
Our framework consistently outperforms existing state-of-the-art methods in image and video COD benchmarks.
arXiv Detail & Related papers (2023-10-31T06:11:23Z) - General-Purpose Multimodal Transformer meets Remote Sensing Semantic
Segmentation [35.100738362291416]
Multimodal AI seeks to exploit complementary data sources, particularly for complex tasks like semantic segmentation.
Recent trends in general-purpose multimodal networks have shown great potential to achieve state-of-the-art performance.
We propose a UNet-inspired module that employs 3D convolution to encode vital local information and learn cross-modal features simultaneously.
arXiv Detail & Related papers (2023-07-07T04:58:34Z) - Redesigning Multi-Scale Neural Network for Crowd Counting [68.674652984003]
We introduce a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting.
Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales.
Experiments show that our method achieves the state-of-the-art performance on five public datasets.
arXiv Detail & Related papers (2022-08-04T21:49:29Z) - Scene-Adaptive Attention Network for Crowd Counting [31.29858034122248]
This paper proposes a scene-adaptive attention network, termed SAANet.
We design a deformable attention in-built Transformer backbone, which learns adaptive feature representations with deformable sampling locations and dynamic attention weights.
We conduct extensive experiments on four challenging crowd counting benchmarks, demonstrating that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-31T15:03:17Z) - Aerial Images Meet Crowdsourced Trajectories: A New Approach to Robust
Road Extraction [110.61383502442598]
We introduce a novel neural network framework termed Cross-Modal Message Propagation Network (CMMPNet)
CMMPNet is composed of two deep Auto-Encoders for modality-specific representation learning and a tailor-designed Dual Enhancement Module for cross-modal representation refinement.
Experiments on three real-world benchmarks demonstrate the effectiveness of our CMMPNet for robust road extraction.
arXiv Detail & Related papers (2021-11-30T04:30:10Z) - Bidirectional Multi-scale Attention Networks for Semantic Segmentation
of Oblique UAV Imagery [30.524771772192757]
We propose the novel bidirectional multi-scale attention networks, which fuse features from multiple scales bidirectionally for more adaptive and effective feature extraction.
Our model achieved the state-of-the-art (SOTA) result with a mean intersection over union (mIoU) score of 70.80%.
arXiv Detail & Related papers (2021-02-05T11:02:15Z) - A Flow Base Bi-path Network for Cross-scene Video Crowd Understanding in
Aerial View [93.23947591795897]
In this paper, we strive to tackle the challenges and automatically understand the crowd from the visual data collected from drones.
To alleviate the background noise generated in cross-scene testing, a double-stream crowd counting model is proposed.
To tackle the crowd density estimation problem under extreme dark environments, we introduce synthetic data generated by game Grand Theft Auto V(GTAV)
arXiv Detail & Related papers (2020-09-29T01:48:24Z) - Shallow Feature Based Dense Attention Network for Crowd Counting [103.67446852449551]
We propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images.
Our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.
arXiv Detail & Related papers (2020-06-17T13:34:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.