Multi-Resolution Fusion and Multi-scale Input Priors Based Crowd
Counting
- URL: http://arxiv.org/abs/2010.01664v1
- Date: Sun, 4 Oct 2020 19:30:13 GMT
- Title: Multi-Resolution Fusion and Multi-scale Input Priors Based Crowd
Counting
- Authors: Usman Sajid, Wenchi Ma, Guanghui Wang
- Abstract summary: The paper proposes a new multi-resolution fusion based end-to-end crowd counting network.
Three input priors are introduced to serve as an efficient and effective alternative to the PRM module.
The proposed approach also has better generalization capability with the best results during the cross-dataset experiments.
- Score: 20.467558675556173
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowd counting in still images is a challenging problem in practice due to
huge crowd-density variations, large perspective changes, severe occlusion, and
variable lighting conditions. The state-of-the-art patch rescaling module (PRM)
based approaches prove to be very effective in improving the crowd counting
performance. However, the PRM module requires an additional and compromising
crowd-density classification process. To address these issues and challenges,
the paper proposes a new multi-resolution fusion based end-to-end crowd
counting network. It employs three deep-layers based columns/branches, each
catering the respective crowd-density scale. These columns regularly fuse
(share) the information with each other. The network is divided into three
phases with each phase containing one or more columns. Three input priors are
introduced to serve as an efficient and effective alternative to the PRM
module, without requiring any additional classification operations. Along with
the final crowd count regression head, the network also contains three
auxiliary crowd estimation regression heads, which are strategically placed at
each phase end to boost the overall performance. Comprehensive experiments on
three benchmark datasets demonstrate that the proposed approach outperforms all
the state-of-the-art models under the RMSE evaluation metric. The proposed
approach also has better generalization capability with the best results during
the cross-dataset experiments.
Related papers
- Multi-modal Crowd Counting via a Broker Modality [64.5356816448361]
Multi-modal crowd counting involves estimating crowd density from both visual and thermal/depth images.
We propose a novel approach by introducing an auxiliary broker modality and frame the task as a triple-modal learning problem.
We devise a fusion-based method to generate this broker modality, leveraging a non-diffusion, lightweight counterpart of modern denoising diffusion-based fusion models.
arXiv Detail & Related papers (2024-07-10T10:13:11Z) - SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion [59.96233305733875]
Time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare.
Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations.
This paper presents an efficient-based model, the Series-cOre Fused Time Series forecaster (SOFTS)
arXiv Detail & Related papers (2024-04-22T14:06:35Z) - Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression [63.23578860867408]
We investigate how to integrate the evaluations of importance and sparsity scores into a single stage.
We present OFB, a cost-efficient approach that simultaneously evaluates both importance and sparsity scores.
Experiments demonstrate that OFB can achieve superior compression performance over state-of-the-art searching-based and pruning-based methods.
arXiv Detail & Related papers (2024-03-23T13:22:36Z) - Generalized Correspondence Matching via Flexible Hierarchical Refinement
and Patch Descriptor Distillation [13.802788788420175]
Correspondence matching plays a crucial role in numerous robotics applications.
This paper addresses the limitations of deep feature matching (DFM), a state-of-the-art (SoTA) plug-and-play correspondence matching approach.
Our proposed method achieves an overall performance in terms of mean matching accuracy of 0.68, 0.92, and 0.95 with respect to the tolerances of 1, 3, and 5 pixels, respectively.
arXiv Detail & Related papers (2024-03-08T15:32:18Z) - Feature Decoupling-Recycling Network for Fast Interactive Segmentation [79.22497777645806]
Recent interactive segmentation methods iteratively take source image, user guidance and previously predicted mask as the input.
We propose the Feature Decoupling-Recycling Network (FDRN), which decouples the modeling components based on their intrinsic discrepancies.
arXiv Detail & Related papers (2023-08-07T12:26:34Z) - Towards More Effective PRM-based Crowd Counting via A Multi-resolution
Fusion and Attention Network [22.235440703471518]
We propose a new PRM based multi-resolution and multi-task crowd counting network.
The proposed model consists of three deep-layered branches with each branch generating feature maps of different resolutions.
The integration of these deep branches with the PRM module and the early-attended blocks proves to be more effective than the original PRM based schemes.
arXiv Detail & Related papers (2021-12-17T18:17:02Z) - Audio-Visual Transformer Based Crowd Counting [27.464399610071418]
The paper proposes a new audiovisual multi-task network to address the critical challenges in crowd counting.
The proposed network introduces the notion of auxiliary and explicit image patch-importance ranking (PIR) and patch-wise crowd estimate (PCE) information.
To acquire rich visual features, we propose a multi-branch structure with transformer-style fusion in-between.
arXiv Detail & Related papers (2021-09-04T20:25:35Z) - Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT
Benchmark for Crowd Counting [109.32927895352685]
We introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people.
To facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework.
Experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.
arXiv Detail & Related papers (2020-12-08T16:18:29Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z) - Plug-and-Play Rescaling Based Crowd Counting in Static Images [24.150701096083242]
We propose a new image patch rescaling module (PRM) and three independent PRM employed crowd counting methods.
The proposed frameworks use the PRM module to rescale the image regions (patches) that require special treatment, whereas the classification process helps in recognizing and discarding any cluttered crowd-like background regions which may result in overestimation.
arXiv Detail & Related papers (2020-01-06T21:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.