Relevant Region Prediction for Crowd Counting
- URL: http://arxiv.org/abs/2005.09816v1
- Date: Wed, 20 May 2020 01:53:24 GMT
- Title: Relevant Region Prediction for Crowd Counting
- Authors: Xinya Chen, Yanrui Bin, Changxin Gao, Nong Sang, and Hao Tang
- Abstract summary: We propose Relevant Region Prediction (RRP) for crowd counting.
RRP consists of the Count Map and the Region Relation-Aware Module (RRAM)
Based on the Graph Convolutional Network (GCN), Region Relation-Aware Module is proposed to capture and exploit the important region dependency.
- Score: 43.85415960107145
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowd counting is a concerned and challenging task in computer vision.
Existing density map based methods excessively focus on the individuals'
localization which harms the crowd counting performance in highly congested
scenes. In addition, the dependency between the regions of different density is
also ignored. In this paper, we propose Relevant Region Prediction (RRP) for
crowd counting, which consists of the Count Map and the Region Relation-Aware
Module (RRAM). Each pixel in the count map represents the number of heads
falling into the corresponding local area in the input image, which discards
the detailed spatial information and forces the network pay more attention to
counting rather than localizing individuals. Based on the Graph Convolutional
Network (GCN), Region Relation-Aware Module is proposed to capture and exploit
the important region dependency. The module builds a fully connected directed
graph between the regions of different density where each node (region) is
represented by weighted global pooled feature, and GCN is learned to map this
region graph to a set of relation-aware regions representations. Experimental
results on three datasets show that our method obviously outperforms other
existing state-of-the-art methods.
Related papers
- Diffusion-based Data Augmentation for Object Counting Problems [62.63346162144445]
We develop a pipeline that utilizes a diffusion model to generate extensive training data.
We are the first to generate images conditioned on a location dot map with a diffusion model.
Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
arXiv Detail & Related papers (2024-01-25T07:28:22Z) - Attentive Graph Enhanced Region Representation Learning [7.4106801792345705]
Representing urban regions accurately and comprehensively is essential for various urban planning and analysis tasks.
We propose the Attentive Graph Enhanced Region Representation Learning (ATGRL) model, which aims to capture comprehensive dependencies from multiple graphs and learn rich semantic representations of urban regions.
arXiv Detail & Related papers (2023-07-06T16:38:43Z) - Redesigning Multi-Scale Neural Network for Crowd Counting [68.674652984003]
We introduce a hierarchical mixture of density experts, which hierarchically merges multi-scale density maps for crowd counting.
Within the hierarchical structure, an expert competition and collaboration scheme is presented to encourage contributions from all scales.
Experiments show that our method achieves the state-of-the-art performance on five public datasets.
arXiv Detail & Related papers (2022-08-04T21:49:29Z) - Urban Region Profiling via A Multi-Graph Representation Learning
Framework [0.0]
We propose a multi-graph representative learning framework, called Region2Vec, for urban region profiling.
Experiments on real-world datasets show that Region2Vec can be employed in three applications and outperforms all state-of-the-art baselines.
arXiv Detail & Related papers (2022-02-04T11:05:37Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - Cross-Region Building Counting in Satellite Imagery using Counting
Consistency [8.732274235941974]
Estimating the number of buildings in any geographical region is a vital component of urban analysis, disaster management, and public policy decision.
Deep learning methods for building localization and counting in satellite imagery, can serve as a viable and cheap alternative.
However, these algorithms suffer performance degradation when applied to the regions on which they have not been trained.
arXiv Detail & Related papers (2021-10-26T10:36:56Z) - An attention-driven hierarchical multi-scale representation for visual
recognition [3.3302293148249125]
Convolutional Neural Networks (CNNs) have revolutionized the understanding of visual content.
We propose a method to capture high-level long-range dependencies by exploring Graph Convolutional Networks (GCNs)
Our approach is simple yet extremely effective in solving both the fine-grained and generic visual classification problems.
arXiv Detail & Related papers (2021-10-23T09:22:22Z) - Learning Neighborhood Representation from Multi-Modal Multi-Graph:
Image, Text, Mobility Graph and Beyond [20.014906526266795]
We propose a novel approach to integrate multi-modal geotagged inputs as either node or edge features of a multi-graph.
Specifically, we use street view images and POI features to characterize neighborhoods (nodes) and use human mobility to characterize the relationship between neighborhoods (directed edges)
The embedding we trained outperforms the ones using only unimodal data as regional inputs.
arXiv Detail & Related papers (2021-05-06T07:44:05Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z) - LRC-Net: Learning Discriminative Features on Point Clouds by Encoding
Local Region Contexts [65.79931333193016]
We present a novel Local-Region-Context Network (LRC-Net) to learn discriminative features on point clouds.
LRC-Net encodes fine-grained contexts inside and among local regions simultaneously.
Results show LRC-Net is competitive with state-of-the-art methods in shape classification and shape segmentation applications.
arXiv Detail & Related papers (2020-03-18T14:34:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.