Coarse- and Fine-grained Attention Network with Background-aware Loss
for Crowd Density Map Estimation
- URL: http://arxiv.org/abs/2011.03721v1
- Date: Sat, 7 Nov 2020 08:05:54 GMT
- Title: Coarse- and Fine-grained Attention Network with Background-aware Loss
for Crowd Density Map Estimation
- Authors: Liangzi Rong, Chunping Li
- Abstract summary: CFANet is a novel method for generating high-quality crowd density maps and people count estimation.
We devise a from-coarse-to-fine progressive attention mechanism by integrating Crowd Region Recognizer (CRR) and Density Level Estimator (DLE) branch.
Our method can not only outperform previous state-of-the-art methods in terms of count accuracy but also improve the image quality of density maps as well as reduce the false recognition ratio.
- Score: 2.690502103971799
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a novel method Coarse- and Fine-grained Attention
Network (CFANet) for generating high-quality crowd density maps and people
count estimation by incorporating attention maps to better focus on the crowd
area. We devise a from-coarse-to-fine progressive attention mechanism by
integrating Crowd Region Recognizer (CRR) and Density Level Estimator (DLE)
branch, which can suppress the influence of irrelevant background and assign
attention weights according to the crowd density levels, because generating
accurate fine-grained attention maps directly is normally difficult. We also
employ a multi-level supervision mechanism to assist the backpropagation of
gradient and reduce overfitting. Besides, we propose a Background-aware
Structural Loss (BSL) to reduce the false recognition ratio while improving the
structural similarity to groundtruth. Extensive experiments on commonly used
datasets show that our method can not only outperform previous state-of-the-art
methods in terms of count accuracy but also improve the image quality of
density maps as well as reduce the false recognition ratio.
Related papers
- Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation [51.66997548477913]
We propose a novel feature-level consistency learning framework named Density-Descending Feature Perturbation (DDFP)
Inspired by the low-density separation assumption in semi-supervised learning, our key insight is that feature density can shed a light on the most promising direction for the segmentation classifier to explore.
The proposed DDFP outperforms other designs on feature-level perturbations and shows state of the art performances on both Pascal VOC and Cityscapes dataset.
arXiv Detail & Related papers (2024-03-11T06:59:05Z) - Diffusion-based Data Augmentation for Object Counting Problems [62.63346162144445]
We develop a pipeline that utilizes a diffusion model to generate extensive training data.
We are the first to generate images conditioned on a location dot map with a diffusion model.
Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
arXiv Detail & Related papers (2024-01-25T07:28:22Z) - $CrowdDiff$: Multi-hypothesis Crowd Density Estimation using Diffusion Models [26.55769846846542]
Crowd counting is a fundamental problem in crowd analysis which is typically accomplished by estimating a crowd density map and summing over the density values.
We present $CrowdDiff$ that generates the crowd density map as a reverse diffusion process.
In addition, owing to the nature of the diffusion model, we introduce producing multiple density maps to improve the counting performance.
arXiv Detail & Related papers (2023-03-22T17:58:01Z) - Semi-supervised Crowd Counting via Density Agency [57.3635501421658]
We build a learnable auxiliary structure, namely the density agency to bring the recognized foreground regional features close to corresponding density sub-classes.
Second, we propose a density-guided contrastive learning loss to consolidate the backbone feature extractor.
Third, we build a regression head by using a transformer structure to refine the foreground features further.
arXiv Detail & Related papers (2022-09-07T06:34:00Z) - Redesigning Multi-Scale Neural Network for Crowd Counting [68.674652984003]
We introduce a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting.
Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales.
Experiments show that our method achieves the state-of-the-art performance on five public datasets.
arXiv Detail & Related papers (2022-08-04T21:49:29Z) - Cascaded Residual Density Network for Crowd Counting [63.714719914701014]
We propose a novel Cascaded Residual Density Network (CRDNet) in a coarse-to-fine approach to generate the high-quality density map for crowd counting more accurately.
A novel additional local count loss is presented to refine the accuracy of crowd counting.
arXiv Detail & Related papers (2021-07-29T03:07:11Z) - CAMERAS: Enhanced Resolution And Sanity preserving Class Activation
Mapping for image saliency [61.40511574314069]
Backpropagation image saliency aims at explaining model predictions by estimating model-centric importance of individual pixels in the input.
We propose CAMERAS, a technique to compute high-fidelity backpropagation saliency maps without requiring any external priors.
arXiv Detail & Related papers (2021-06-20T08:20:56Z) - In-sample Contrastive Learning and Consistent Attention for Weakly
Supervised Object Localization [18.971497314227275]
Weakly supervised object localization (WSOL) aims to localize the target object using only the image-level supervision.
Recent methods encourage the model to activate feature maps over the entire object by dropping the most discriminative parts.
We consider the background as an important cue that guides the feature activation to cover the sophisticated object region.
arXiv Detail & Related papers (2020-09-25T07:24:46Z) - Recurrent Distillation based Crowd Counting [23.4315417286694]
We propose a simple yet effective crowd counting framework that is able to achieve the state-of-the-art performance on various crowded scenes.
In experiments, we demonstrate that, with our simple convolutional neural network architecture strengthened by our proposed training algorithm, our model is able to outperform or be comparable with the state-of-the-art methods.
arXiv Detail & Related papers (2020-06-14T01:04:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.