Region-Aware Network: Model Human's Top-Down Visual Perception Mechanism
for Crowd Counting
- URL: http://arxiv.org/abs/2106.12163v1
- Date: Wed, 23 Jun 2021 05:11:58 GMT
- Title: Region-Aware Network: Model Human's Top-Down Visual Perception Mechanism
for Crowd Counting
- Authors: Yuehai Chen, Jing Yang, Dong Zhang, Kun Zhang, Badong Chen and Shaoyi
Du
- Abstract summary: Background noise and scale variation are common problems that have been long recognized in crowd counting.
We propose a novel feedback network with Region-Aware block called RANet by modeling human's Top-Down visual perception mechanism.
Our method outperforms state-of-the-art crowd counting methods on several public datasets.
- Score: 33.09330894823192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background noise and scale variation are common problems that have been long
recognized in crowd counting. Humans glance at a crowd image and instantly know
the approximate number of human and where they are through attention the crowd
regions and the congestion degree of crowd regions with a global receptive
filed. Hence, in this paper, we propose a novel feedback network with
Region-Aware block called RANet by modeling human's Top-Down visual perception
mechanism. Firstly, we introduce a feedback architecture to generate priority
maps that provide prior about candidate crowd regions in input images. The
prior enables the RANet pay more attention to crowd regions. Then we design
Region-Aware block that could adaptively encode the contextual information into
input images through global receptive field. More specifically, we scan the
whole input images and its priority maps in the form of column vector to obtain
a relevance matrix estimating their similarity. The relevance matrix obtained
would be utilized to build global relationships between pixels. Our method
outperforms state-of-the-art crowd counting methods on several public datasets.
Related papers
- CrowdRec: 3D Crowd Reconstruction from Single Color Images [17.662273473398592]
We exploit the crowd features and propose a crowd-constrained optimization to improve the common single-person method on crowd images.
With the optimization, we can obtain accurate body poses and shapes with reasonable absolute positions from a large-scale crowd image.
arXiv Detail & Related papers (2023-10-10T06:03:39Z) - R-MAE: Regions Meet Masked Autoencoders [113.73147144125385]
We explore regions as a potential visual analogue of words for self-supervised image representation learning.
Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to learn from groups of pixels or regions.
arXiv Detail & Related papers (2023-06-08T17:56:46Z) - Crowd3D: Towards Hundreds of People Reconstruction from a Single Image [57.58149031283827]
We propose Crowd3D, the first framework to reconstruct the 3D poses, shapes and locations of hundreds of people with global consistency from a single large-scene image.
To deal with a large number of persons and various human sizes, we also design an adaptive human-centric cropping scheme.
arXiv Detail & Related papers (2023-01-23T11:45:27Z) - Scene-Adaptive Attention Network for Crowd Counting [31.29858034122248]
This paper proposes a scene-adaptive attention network, termed SAANet.
We design a deformable attention in-built Transformer backbone, which learns adaptive feature representations with deformable sampling locations and dynamic attention weights.
We conduct extensive experiments on four challenging crowd counting benchmarks, demonstrating that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-31T15:03:17Z) - Fine-Grained Crowd Counting [59.63412475367119]
Current crowd counting algorithms are only concerned with the number of people in an image.
We propose fine-grained crowd counting, which differentiates a crowd into categories based on the low-level behavior attributes of the individuals.
arXiv Detail & Related papers (2020-07-13T01:31:12Z) - Shallow Feature Based Dense Attention Network for Crowd Counting [103.67446852449551]
We propose a Shallow feature based Dense Attention Network (SDANet) for crowd counting from still images.
Our method outperforms other existing methods by a large margin, as is evident from a remarkable 11.9% Mean Absolute Error (MAE) drop of our SDANet.
arXiv Detail & Related papers (2020-06-17T13:34:42Z) - Over-crowdedness Alert! Forecasting the Future Crowd Distribution [87.12694319017346]
We formulate a novel crowd analysis problem, in which we aim to predict the crowd distribution in the near future given sequential frames of a crowd video without any identity annotations.
To solve this problem, we propose a global-residual two-stream recurrent network, which leverages the consecutive crowd video frames as inputs and their corresponding density maps as auxiliary information.
arXiv Detail & Related papers (2020-06-09T08:59:54Z) - Relevant Region Prediction for Crowd Counting [43.85415960107145]
We propose Relevant Region Prediction (RRP) for crowd counting.
RRP consists of the Count Map and the Region Relation-Aware Module (RRAM)
Based on the Graph Convolutional Network (GCN), Region Relation-Aware Module is proposed to capture and exploit the important region dependency.
arXiv Detail & Related papers (2020-05-20T01:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.