Multi-scale Feature Aggregation for Crowd Counting
- URL: http://arxiv.org/abs/2208.05256v2
- Date: Thu, 11 Aug 2022 13:41:28 GMT
- Title: Multi-scale Feature Aggregation for Crowd Counting
- Authors: Xiaoheng Jiang, Xinyi Wu, Hisham Cholakkal, Rao Muhammad Anwer, Jiale
Cao Mingliang Xu, Bing Zhou, Yanwei Pang and Fahad Shahbaz Khan
- Abstract summary: We propose a multi-scale feature aggregation network (MSFANet)
MSFANet consists of two feature aggregation modules: the short aggregation (ShortAgg) and the skip aggregation (SkipAgg)
- Score: 84.45773306711747
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional Neural Network (CNN) based crowd counting methods have achieved
promising results in the past few years. However, the scale variation problem
is still a huge challenge for accurate count estimation. In this paper, we
propose a multi-scale feature aggregation network (MSFANet) that can alleviate
this problem to some extent. Specifically, our approach consists of two feature
aggregation modules: the short aggregation (ShortAgg) and the skip aggregation
(SkipAgg). The ShortAgg module aggregates the features of the adjacent
convolution blocks. Its purpose is to make features with different receptive
fields fused gradually from the bottom to the top of the network. The SkipAgg
module directly propagates features with small receptive fields to features
with much larger receptive fields. Its purpose is to promote the fusion of
features with small and large receptive fields. Especially, the SkipAgg module
introduces the local self-attention features from the Swin Transformer blocks
to incorporate rich spatial information. Furthermore, we present a
local-and-global based counting loss by considering the non-uniform crowd
distribution. Extensive experiments on four challenging datasets (ShanghaiTech
dataset, UCF_CC_50 dataset, UCF-QNRF Dataset, WorldExpo'10 dataset) demonstrate
the proposed easy-to-implement MSFANet can achieve promising results when
compared with the previous state-of-the-art approaches.
Related papers
- Sequential Signal Mixing Aggregation for Message Passing Graph Neural Networks [2.7719338074999547]
We introduce Sequential Signal Mixing Aggregation (SSMA), a novel plug-and-play aggregation for MPGNNs.
SSMA treats the neighbor features as 2D discrete signals and sequentially convolves them, inherently enhancing the ability to mix features attributed to distinct neighbors.
We show that when combining SSMA with well-established MPGNN architectures, we achieve substantial performance gains across various benchmarks.
arXiv Detail & Related papers (2024-09-28T17:13:59Z) - Alleviating Over-Smoothing via Aggregation over Compact Manifolds [19.559230417122826]
Graph neural networks (GNNs) have achieved significant success in various applications.
Most GNNs learn the node features with information aggregation of its neighbors and feature transformation in each layer.
However, the node features become indistinguishable after many layers, leading to performance deterioration.
arXiv Detail & Related papers (2024-07-27T11:02:12Z) - M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient
Object Detection [22.60675416709486]
M$3$Net is an attention network for Salient Object Detection.
Cross-attention approach to achieve the interaction between multilevel features.
Mixed Attention Block aims at modeling context at both global and local levels.
Multilevel supervision strategy to optimize the aggregated feature stage-by-stage.
arXiv Detail & Related papers (2023-09-15T12:46:14Z) - Mutual-Guided Dynamic Network for Image Fusion [51.615598671899335]
We propose a novel mutual-guided dynamic network (MGDN) for image fusion, which allows for effective information utilization across different locations and inputs.
Experimental results on five benchmark datasets demonstrate that our proposed method outperforms existing methods on four image fusion tasks.
arXiv Detail & Related papers (2023-08-24T03:50:37Z) - PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute
Recognition [23.814762073093153]
We propose a pure transformer-based multi-task PAR network named PARFormer, which includes four modules.
In the feature extraction module, we build a strong baseline for feature extraction, which achieves competitive results on several PAR benchmarks.
In the viewpoint perception module, we explore the impact of viewpoints on pedestrian attributes, and propose a multi-view contrastive loss.
In the attribute recognition module, we alleviate the negative-positive imbalance problem to generate the attribute predictions.
arXiv Detail & Related papers (2023-04-14T16:27:56Z) - CloudAttention: Efficient Multi-Scale Attention Scheme For 3D Point
Cloud Learning [81.85951026033787]
We set transformers in this work and incorporate them into a hierarchical framework for shape classification and part and scene segmentation.
We also compute efficient and dynamic global cross attentions by leveraging sampling and grouping at each iteration.
The proposed hierarchical model achieves state-of-the-art shape classification in mean accuracy and yields results on par with the previous segmentation methods.
arXiv Detail & Related papers (2022-07-31T21:39:15Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Crowd Counting via Hierarchical Scale Recalibration Network [61.09833400167511]
We propose a novel Hierarchical Scale Recalibration Network (HSRNet) to tackle the task of crowd counting.
HSRNet models rich contextual dependencies and recalibrating multiple scale-associated information.
Our approach can ignore various noises selectively and focus on appropriate crowd scales automatically.
arXiv Detail & Related papers (2020-03-07T10:06:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.