Towards More Effective PRM-based Crowd Counting via A Multi-resolution
Fusion and Attention Network
- URL: http://arxiv.org/abs/2112.09664v1
- Date: Fri, 17 Dec 2021 18:17:02 GMT
- Title: Towards More Effective PRM-based Crowd Counting via A Multi-resolution
Fusion and Attention Network
- Authors: Usman Sajid, Guanghui Wang
- Abstract summary: We propose a new PRM based multi-resolution and multi-task crowd counting network.
The proposed model consists of three deep-layered branches with each branch generating feature maps of different resolutions.
The integration of these deep branches with the PRM module and the early-attended blocks proves to be more effective than the original PRM based schemes.
- Score: 22.235440703471518
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The paper focuses on improving the recent plug-and-play patch rescaling
module (PRM) based approaches for crowd counting. In order to make full use of
the PRM potential and obtain more reliable and accurate results for challenging
images with crowd-variation, large perspective, extreme occlusions, and
cluttered background regions, we propose a new PRM based multi-resolution and
multi-task crowd counting network by exploiting the PRM module with more
effectiveness and potency. The proposed model consists of three deep-layered
branches with each branch generating feature maps of different resolutions.
These branches perform a feature-level fusion across each other to build the
vital collective knowledge to be used for the final crowd estimate.
Additionally, early-stage feature maps undergo visual attention to strengthen
the later-stage channels understanding of the foreground regions. The
integration of these deep branches with the PRM module and the early-attended
blocks proves to be more effective than the original PRM based schemes through
extensive numerical and visual evaluations on four benchmark datasets. The
proposed approach yields a significant improvement by a margin of 12.6% in
terms of the RMSE evaluation criterion. It also outperforms state-of-the-art
methods in cross-dataset evaluations.
Related papers
- SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion [59.96233305733875]
Time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare.
Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations.
This paper presents an efficient-based model, the Series-cOre Fused Time Series forecaster (SOFTS)
arXiv Detail & Related papers (2024-04-22T14:06:35Z) - MCU-Net: A Multi-prior Collaborative Deep Unfolding Network with Gates-controlled Spatial Attention for Accelerated MR Image Reconstruction [9.441882492801174]
Deep unfolding networks (DUNs) have demonstrated significant potential in accelerating magnetic resonance imaging (MRI)
However, they often encounter high computational costs and slow convergence rates.
We propose a multi-prior collaborative DUN, termed MCU-Net, to address these limitations.
arXiv Detail & Related papers (2024-02-04T07:29:00Z) - Spatial Attention-based Distribution Integration Network for Human Pose
Estimation [0.8052382324386398]
We present the Spatial Attention-based Distribution Integration Network (SADI-NET) to improve the accuracy of localization.
Our network consists of three efficient models: the receptive fortified module (RFM), spatial fusion module (SFM), and distribution learning module (DLM)
Our model obtained a remarkable $92.10%$ percent accuracy on the MPII test dataset, demonstrating significant improvements over existing models and establishing state-of-the-art performance.
arXiv Detail & Related papers (2023-11-09T12:43:01Z) - MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations.
Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived.
We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z) - Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19
Lung Infection Segmentation on CT Images [1.898617934078969]
In this paper, a novel segmentation scheme is proposed for the infections of COVID-19 on CT images.
A deep collaborative supervision scheme is proposed to guide the network learning the features of edges and semantics.
The effectiveness of the proposed scheme is demonstrated on four various COVID-19 CT datasets.
arXiv Detail & Related papers (2021-12-20T07:32:39Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - Learning to Perform Downlink Channel Estimation in Massive MIMO Systems [72.76968022465469]
We study downlink (DL) channel estimation in a Massive multiple-input multiple-output (MIMO) system.
A common approach is to use the mean value as the estimate, motivated by channel hardening.
We propose two novel estimation methods.
arXiv Detail & Related papers (2021-09-06T13:42:32Z) - Crowd Counting via Perspective-Guided Fractional-Dilation Convolution [75.36662947203192]
This paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dilation Network (PFDNet)
By modeling the continuous scale variations, the proposed PFDNet is able to select the proper fractional dilation kernels for adapting to different spatial locations.
It significantly improves the flexibility of the state-of-the-arts that only consider the discrete representative scales.
arXiv Detail & Related papers (2021-07-08T07:57:00Z) - Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT
Benchmark for Crowd Counting [109.32927895352685]
We introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people.
To facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework.
Experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.
arXiv Detail & Related papers (2020-12-08T16:18:29Z) - Multi-Resolution Fusion and Multi-scale Input Priors Based Crowd
Counting [20.467558675556173]
The paper proposes a new multi-resolution fusion based end-to-end crowd counting network.
Three input priors are introduced to serve as an efficient and effective alternative to the PRM module.
The proposed approach also has better generalization capability with the best results during the cross-dataset experiments.
arXiv Detail & Related papers (2020-10-04T19:30:13Z) - Plug-and-Play Rescaling Based Crowd Counting in Static Images [24.150701096083242]
We propose a new image patch rescaling module (PRM) and three independent PRM employed crowd counting methods.
The proposed frameworks use the PRM module to rescale the image regions (patches) that require special treatment, whereas the classification process helps in recognizing and discarding any cluttered crowd-like background regions which may result in overestimation.
arXiv Detail & Related papers (2020-01-06T21:43:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.