Related papers: Bayesian Multi-Scale Neural Network for Crowd Counting

Bayesian Multi-Scale Neural Network for Crowd Counting

URL: http://arxiv.org/abs/2007.14245v4
Date: Wed, 09 Jul 2025 13:07:00 GMT
Title: Bayesian Multi-Scale Neural Network for Crowd Counting
Authors: Abhinav Sagar,
Abstract summary: Crowd counting is a challenging yet critical task in computer vision.<n>Recent advances using Convolutional Neural Networks (CNNs) that estimate density maps have shown significant success.<n>We propose a novel deep learning architecture that effectively addresses these challenges.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Crowd counting is a challenging yet critical task in computer vision with applications ranging from public safety to urban planning. Recent advances using Convolutional Neural Networks (CNNs) that estimate density maps have shown significant success. However, accurately counting individuals in highly congested scenes remains an open problem due to severe occlusions, scale variations, and perspective distortions, where people appear at drastically different sizes across the image. In this work, we propose a novel deep learning architecture that effectively addresses these challenges. Our network integrates a ResNet-based feature extractor for capturing rich hierarchical representations, followed by a downsampling block employing dilated convolutions to preserve spatial resolution while expanding the receptive field. An upsampling block using transposed convolutions reconstructs the high-resolution density map. Central to our architecture is a novel Perspective-aware Aggregation Module (PAM) designed to enhance robustness to scale and perspective variations by adaptively aggregating multi-scale contextual information. We detail the training procedure, including the loss functions and optimization strategies used. Our method is evaluated on three widely used benchmark datasets using Mean Absolute Error (MAE) and Mean Squared Error (MSE) as evaluation metrics. Experimental results demonstrate that our model achieves superior performance compared to existing state-of-the-art methods. Additionally, we incorporate principled Bayesian inference techniques to provide uncertainty estimates along with the crowd count predictions, offering a measure of confidence in the model's outputs.

Related papers

Diffusion-based Data Augmentation for Object Counting Problems [62.63346162144445]
We develop a pipeline that utilizes a diffusion model to generate extensive training data. We are the first to generate images conditioned on a location dot map with a diffusion model. Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
arXiv Detail & Related papers (2024-01-25T07:28:22Z)
RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images. We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z)
A Geometrical Approach to Evaluate the Adversarial Robustness of Deep Neural Networks [52.09243852066406]
Adversarial Converging Time Score (ACTS) measures the converging time as an adversarial robustness metric. We validate the effectiveness and generalization of the proposed ACTS metric against different adversarial attacks on the large-scale ImageNet dataset.
arXiv Detail & Related papers (2023-10-10T09:39:38Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
Redesigning Multi-Scale Neural Network for Crowd Counting [68.674652984003]
We introduce a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting. Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales. Experiments show that our method achieves the state-of-the-art performance on five public datasets.
arXiv Detail & Related papers (2022-08-04T21:49:29Z)
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection [0.0]
We propose a mixed-scale triplet network, bf ZoomNet, which mimics the behavior of humans when observing vague images. Specifically, our ZoomNet employs the zoom strategy to learn the discriminative mixed-scale semantics by the designed scale integration unit and hierarchical mixed-scale unit. Our proposed highly task-friendly model consistently surpasses the existing 23 state-of-the-art methods on four public datasets.
arXiv Detail & Related papers (2022-03-05T09:13:52Z)
PDC-Net+: Enhanced Probabilistic Dense Correspondence Network [161.76275845530964]
Enhanced Probabilistic Dense Correspondence Network, PDC-Net+, capable of estimating accurate dense correspondences. We develop an architecture and an enhanced training strategy tailored for robust and generalizable uncertainty prediction. Our approach obtains state-of-the-art results on multiple challenging geometric matching and optical flow datasets.
arXiv Detail & Related papers (2021-09-28T17:56:41Z)
Point-Cloud Deep Learning of Porous Media for Permeability Prediction [0.0]
We propose a novel deep learning framework for predicting permeability of porous media from their digital images. We model the boundary between solid matrix and pore spaces as point clouds and feed them as inputs to a neural network based on the PointNet architecture.
arXiv Detail & Related papers (2021-07-18T22:59:21Z)
PSCNet: Pyramidal Scale and Global Context Guided Network for Crowd Counting [44.306790250158954]
This paper proposes a novel crowd counting approach based on pyramidal scale module (PSM) and global context module (GCM) PSM is used to adaptively capture multi-scale information, which can identify a fine boundary of crowds with different image scales. GCM is devised with low-complexity and lightweight manner, to make the interactive information across the channels of the feature maps more efficient.
arXiv Detail & Related papers (2020-12-07T11:35:56Z)
Monocular Depth Estimation Using Multi Scale Neural Network And Feature Fusion [0.0]
Our network uses two different blocks, first which uses different filter sizes for convolution and merges all the individual feature maps. The second block uses dilated convolutions in place of fully connected layers thus reducing computations and increasing the receptive field. We train and test our network on Make 3D dataset, NYU Depth V2 dataset and Kitti dataset using standard evaluation metrics for depth estimation comprised of RMSE loss and SILog loss.
arXiv Detail & Related papers (2020-09-11T18:08:52Z)
Shallow Feature Based Dense Attention Network for Crowd Counting [103.67446852449551]
We propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images. Our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.
arXiv Detail & Related papers (2020-06-17T13:34:42Z)
JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [92.15895515035795]
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations. We propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
arXiv Detail & Related papers (2020-04-07T14:59:35Z)
Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction. We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data. Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting. HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information. Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.