Shallow Feature Based Dense Attention Network for Crowd Counting
- URL: http://arxiv.org/abs/2006.09853v1
- Date: Wed, 17 Jun 2020 13:34:42 GMT
- Title: Shallow Feature Based Dense Attention Network for Crowd Counting
- Authors: Yunqi Miao, Zijia Lin, Guiguang Ding, Jungong Han
- Abstract summary: We propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images.
Our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.
- Score: 103.67446852449551
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While the performance of crowd counting via deep learning has been improved
dramatically in the recent years, it remains an ingrained problem due to
cluttered backgrounds and varying scales of people within an image. In this
paper, we propose a Shallow feature based Dense Attention Network (SDANet) for
crowd counting from still images, which diminishes the impact of backgrounds
via involving a shallow feature based attention model, and meanwhile, captures
multi-scale information via densely connecting hierarchical image features.
Specifically, inspired by the observation that backgrounds and human crowds
generally have noticeably different responses in shallow features, we decide to
build our attention model upon shallow-feature maps, which results in accurate
background-pixel detection. Moreover, considering that the most representative
features of people across different scales can appear in different layers of a
feature extraction network, to better keep them all, we propose to densely
connect hierarchical image features of different layers and subsequently encode
them for estimating crowd density. Experimental results on three benchmark
datasets clearly demonstrate the superiority of SDANet when dealing with
different scenarios. Particularly, on the challenging UCF CC 50 dataset, our
method outperforms other existing methods by a large margin, as is evident from
a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.
Related papers
- Robust Network Learning via Inverse Scale Variational Sparsification [55.64935887249435]
We introduce an inverse scale variational sparsification framework within a time-continuous inverse scale space formulation.
Unlike frequency-based methods, our approach not only removes noise by smoothing small-scale features.
We show the efficacy of our approach through enhanced robustness against various noise types.
arXiv Detail & Related papers (2024-09-27T03:17:35Z) - Multi-scale Unified Network for Image Classification [33.560003528712414]
CNNs face notable challenges in performance and computational efficiency when dealing with real-world, multi-scale image inputs.
We propose Multi-scale Unified Network (MUSN) consisting of multi-scales, a unified network, and scale-invariant constraint.
MUSN yields an accuracy increase up to 44.53% and diminishes FLOPs by 7.01-16.13% in multi-scale scenarios.
arXiv Detail & Related papers (2024-03-27T06:40:26Z) - CLAD: A Contrastive Learning based Approach for Background Debiasing [43.0296255565593]
We introduce a contrastive learning-based approach to mitigate the background bias in CNNs.
We achieve state-of-the-art results on the Background Challenge dataset, outperforming the previous benchmark with a margin of 4.1%.
arXiv Detail & Related papers (2022-10-06T08:33:23Z) - Crowd counting with segmentation attention convolutional neural network [20.315829094519128]
We propose a novel convolutional neural network architecture called SegCrowdNet.
SegCrowdNet adaptively highlights the human head region and suppresses the non-head region by segmentation.
SegCrowdNet achieves excellent performance compared with the state-of-the-art methods.
arXiv Detail & Related papers (2022-04-15T08:40:38Z) - PGGANet: Pose Guided Graph Attention Network for Person
Re-identification [0.0]
Person re-identification (ReID) aims at retrieving a person from images captured by different cameras.
It has been proved that using local features together with global feature of person image could help to give robust feature representations for person retrieval.
We propose a pose guided graph attention network, a multi-branch architecture consisting of one branch for global feature, one branch for mid-granular body features and one branch for fine-granular key point features.
arXiv Detail & Related papers (2021-11-29T09:47:39Z) - Bayesian Multi Scale Neural Network for Crowd Counting [0.0]
We propose a new network which uses a ResNet based feature extractor, downsampling block which uses dilated convolutions and upsampling block using transposed convolutions.
We present a novel aggregation module which makes our network robust to the perspective view problem.
arXiv Detail & Related papers (2020-07-11T21:43:20Z) - Focus Longer to See Better:Recursively Refined Attention for
Fine-Grained Image Classification [148.4492675737644]
Deep Neural Network has shown great strides in the coarse-grained image classification task.
In this paper, we try to focus on these marginal differences to extract more representative features.
Our network repetitively focuses on parts of images to spot small discriminative parts among the classes.
arXiv Detail & Related papers (2020-05-22T03:14:18Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - JHU-CROWD++: Large-Scale Crowd Counting Dataset and A Benchmark Method [92.15895515035795]
We introduce a new large scale unconstrained crowd counting dataset (JHU-CROWD++) that contains "4,372" images with "1.51 million" annotations.
We propose a novel crowd counting network that progressively generates crowd density maps via residual error estimation.
arXiv Detail & Related papers (2020-04-07T14:59:35Z) - Learning Enriched Features for Real Image Restoration and Enhancement [166.17296369600774]
convolutional neural networks (CNNs) have achieved dramatic improvements over conventional approaches for image restoration task.
We present a novel architecture with the collective goals of maintaining spatially-precise high-resolution representations through the entire network.
Our approach learns an enriched set of features that combines contextual information from multiple scales, while simultaneously preserving the high-resolution spatial details.
arXiv Detail & Related papers (2020-03-15T11:04:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.