Related papers: Towards More Effective PRM-based Crowd Counting via A Multi-resolution Fusion and Attention Network

Towards More Effective PRM-based Crowd Counting via A Multi-resolution Fusion and Attention Network

URL: http://arxiv.org/abs/2112.09664v1
Date: Fri, 17 Dec 2021 18:17:02 GMT
Title: Towards More Effective PRM-based Crowd Counting via A Multi-resolution Fusion and Attention Network
Authors: Usman Sajid, Guanghui Wang
Abstract summary: We propose a new PRM based multi-resolution and multi-task crowd counting network. The proposed model consists of three deep-layered branches with each branch generating feature maps of different resolutions. The integration of these deep branches with the PRM module and the early-attended blocks proves to be more effective than the original PRM based schemes.
Score: 22.235440703471518
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The paper focuses on improving the recent plug-and-play patch rescaling module (PRM) based approaches for crowd counting. In order to make full use of the PRM potential and obtain more reliable and accurate results for challenging images with crowd-variation, large perspective, extreme occlusions, and cluttered background regions, we propose a new PRM based multi-resolution and multi-task crowd counting network by exploiting the PRM module with more effectiveness and potency. The proposed model consists of three deep-layered branches with each branch generating feature maps of different resolutions. These branches perform a feature-level fusion across each other to build the vital collective knowledge to be used for the final crowd estimate. Additionally, early-stage feature maps undergo visual attention to strengthen the later-stage channels understanding of the foreground regions. The integration of these deep branches with the PRM module and the early-attended blocks proves to be more effective than the original PRM based schemes through extensive numerical and visual evaluations on four benchmark datasets. The proposed approach yields a significant improvement by a margin of 12.6% in terms of the RMSE evaluation criterion. It also outperforms state-of-the-art methods in cross-dataset evaluations.

Related papers

SRMF: A Data Augmentation and Multimodal Fusion Approach for Long-Tail UHR Satellite Image Segmentation [24.914583619821585]
We introduceF, a novel framework for semantic segmentation in ultra-high-resolution (UHR) satellite imagery. Our approach addresses the long-tail class distribution by incorporating a multi-scale cropping technique alongside a data augmentation strategy based on semantic reordering and resampling. Experiments on the URUR, GID, and FBP datasets demonstrate that our method improves mIoU by 3.33%, 0.66%, and 0.98%, respectively, achieving state-of-the-art performance.
arXiv Detail & Related papers (2025-04-28T14:39:59Z)
Accurate Peak Detection in Multimodal Optimization via Approximated Landscape Learning [8.839347987566336]
We propose a novel optimization framework tailored for MMOPs, termed as APDMMO, which facilitates peak detection via fully leveraging the landscape knowledge. Specifically, we first design a novel surrogate landscape model which ensembles a group of non-linear activation units to improve the regression accuracy on diverse MMOPs. Then we propose a free-of-trial peak detection method which efficiently locates potential peak areas through back-propagation on the learned surrogate landscape model.
arXiv Detail & Related papers (2025-03-23T13:21:53Z)
SOFTS: Efficient Multivariate Time Series Forecasting with Series-Core Fusion [59.96233305733875]
Time series forecasting plays a crucial role in various fields such as finance, traffic management, energy, and healthcare. Several methods utilize mechanisms like attention or mixer to address this by capturing channel correlations. This paper presents an efficient-based model, the Series-cOre Fused Time Series forecaster (SOFTS)
arXiv Detail & Related papers (2024-04-22T14:06:35Z)
MCU-Net: A Multi-prior Collaborative Deep Unfolding Network with Gates-controlled Spatial Attention for Accelerated MR Image Reconstruction [9.441882492801174]
Deep unfolding networks (DUNs) have demonstrated significant potential in accelerating magnetic resonance imaging (MRI) However, they often encounter high computational costs and slow convergence rates. We propose a multi-prior collaborative DUN, termed MCU-Net, to address these limitations.
arXiv Detail & Related papers (2024-02-04T07:29:00Z)
Curriculum-scheduled Knowledge Distillation from Multiple Pre-trained Teachers for Multi-domain Sequential Recommendation [102.91236882045021]
It is essential to explore how to use different pre-trained recommendation models efficiently in real-world systems. We propose a novel curriculum-scheduled knowledge distillation from multiple pre-trained teachers for multi-domain sequential recommendation. CKD-MDSR takes full advantages of different PRMs as multiple teacher models to boost a small student recommendation model.
arXiv Detail & Related papers (2024-01-01T15:57:15Z)
Spatial Attention-based Distribution Integration Network for Human Pose Estimation [0.8052382324386398]
We present the Spatial Attention-based Distribution Integration Network (SADI-NET) to improve the accuracy of localization. Our network consists of three efficient models: the receptive fortified module (RFM), spatial fusion module (SFM), and distribution learning module (DLM) Our model obtained a remarkable $92.10%$ percent accuracy on the MPII test dataset, demonstrating significant improvements over existing models and establishing state-of-the-art performance.
arXiv Detail & Related papers (2023-11-09T12:43:01Z)
MMLatch: Bottom-up Top-down Fusion for Multimodal Sentiment Analysis [84.7287684402508]
Current deep learning approaches for multimodal fusion rely on bottom-up fusion of high and mid-level latent modality representations. Models of human perception highlight the importance of top-down fusion, where high-level representations affect the way sensory inputs are perceived. We propose a neural architecture that captures top-down cross-modal interactions, using a feedback mechanism in the forward pass during network training.
arXiv Detail & Related papers (2022-01-24T17:48:04Z)
Deep Co-supervision and Attention Fusion Strategy for Automatic COVID-19 Lung Infection Segmentation on CT Images [1.898617934078969]
In this paper, a novel segmentation scheme is proposed for the infections of COVID-19 on CT images. A deep collaborative supervision scheme is proposed to guide the network learning the features of edges and semantics. The effectiveness of the proposed scheme is demonstrated on four various COVID-19 CT datasets.
arXiv Detail & Related papers (2021-12-20T07:32:39Z)
PANet: Perspective-Aware Network with Dynamic Receptive Fields and Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem. Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework. The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z)
Crowd Counting via Perspective-Guided Fractional-Dilation Convolution [75.36662947203192]
This paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dilation Network (PFDNet) By modeling the continuous scale variations, the proposed PFDNet is able to select the proper fractional dilation kernels for adapting to different spatial locations. It significantly improves the flexibility of the state-of-the-arts that only consider the discrete representative scales.
arXiv Detail & Related papers (2021-07-08T07:57:00Z)
Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT Benchmark for Crowd Counting [109.32927895352685]
We introduce a large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030 pairs of RGB-thermal images with 138,389 annotated people. To facilitate the multimodal crowd counting, we propose a cross-modal collaborative representation learning framework. Experiments conducted on the RGBT-CC benchmark demonstrate the effectiveness of our framework for RGBT crowd counting.
arXiv Detail & Related papers (2020-12-08T16:18:29Z)
Multi-Resolution Fusion and Multi-scale Input Priors Based Crowd Counting [20.467558675556173]
The paper proposes a new multi-resolution fusion based end-to-end crowd counting network. Three input priors are introduced to serve as an efficient and effective alternative to the PRM module. The proposed approach also has better generalization capability with the best results during the cross-dataset experiments.
arXiv Detail & Related papers (2020-10-04T19:30:13Z)
Plug-and-Play Rescaling Based Crowd Counting in Static Images [24.150701096083242]
We propose a new image patch rescaling module (PRM) and three independent PRM employed crowd counting methods. The proposed frameworks use the PRM module to rescale the image regions (patches) that require special treatment, whereas the classification process helps in recognizing and discarding any cluttered crowd-like background regions which may result in overestimation.
arXiv Detail & Related papers (2020-01-06T21:43:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.