PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting
- URL: http://arxiv.org/abs/2111.00406v1
- Date: Sun, 31 Oct 2021 04:43:05 GMT
- Title: PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting
- Authors: Xiaoshuang Chen, Yiru Zhao, Yu Qin, Fei Jiang, Mingyuan Tao, Xiansheng
Hua, Hongtao Lu
- Abstract summary: We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
- Score: 63.84828478688975
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Crowd counting aims to learn the crowd density distributions and estimate the
number of objects (e.g. persons) in images. The perspective effect, which
significantly influences the distribution of data points, plays an important
role in crowd counting. In this paper, we propose a novel perspective-aware
approach called PANet to address the perspective problem. Based on the
observation that the size of the objects varies greatly in one image due to the
perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution
parameters according to the input image, which helps the model to extract more
discriminative features for each local region. Different from most previous
works which use Gaussian kernels to generate the density map as the supervised
information, we propose the self-distilling supervision (SDS) training method.
The ground-truth density maps are refined from the first training stage and the
perspective information is distilled to the model in the second stage. The
experimental results on ShanghaiTech Part_A and Part_B, UCF_QNRF, and UCF_CC_50
datasets demonstrate that our proposed PANet outperforms the state-of-the-art
methods by a large margin.
Related papers
- FSD-BEV: Foreground Self-Distillation for Multi-view 3D Object Detection [33.225938984092274]
We propose a Foreground Self-Distillation (FSD) scheme that effectively avoids the issue of distribution discrepancies.
We also design two Point Cloud Intensification ( PCI) strategies to compensate for the sparsity of point clouds.
We develop a Multi-Scale Foreground Enhancement (MSFE) module to extract and fuse multi-scale foreground features.
arXiv Detail & Related papers (2024-07-14T09:39:44Z) - Diffusion-based Data Augmentation for Object Counting Problems [62.63346162144445]
We develop a pipeline that utilizes a diffusion model to generate extensive training data.
We are the first to generate images conditioned on a location dot map with a diffusion model.
Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
arXiv Detail & Related papers (2024-01-25T07:28:22Z) - Harnessing Diffusion Models for Visual Perception with Meta Prompts [68.78938846041767]
We propose a simple yet effective scheme to harness a diffusion model for visual perception tasks.
We introduce learnable embeddings (meta prompts) to the pre-trained diffusion models to extract proper features for perception.
Our approach achieves new performance records in depth estimation tasks on NYU depth V2 and KITTI, and in semantic segmentation task on CityScapes.
arXiv Detail & Related papers (2023-12-22T14:40:55Z) - RadOcc: Learning Cross-Modality Occupancy Knowledge through Rendering
Assisted Distillation [50.35403070279804]
3D occupancy prediction is an emerging task that aims to estimate the occupancy states and semantics of 3D scenes using multi-view images.
We propose RadOcc, a Rendering assisted distillation paradigm for 3D Occupancy prediction.
arXiv Detail & Related papers (2023-12-19T03:39:56Z) - Focus for Free in Density-Based Counting [56.961229110268036]
We introduce two methods that repurpose the available point annotations to enhance counting performance.
The first is a counting-specific augmentation that leverages point annotations to simulate occluded objects in both input and density images.
The second method, foreground distillation, generates foreground masks from the point annotations, from which we train an auxiliary network on images with blacked-out backgrounds.
arXiv Detail & Related papers (2023-06-08T11:54:37Z) - Crowd Counting via Perspective-Guided Fractional-Dilation Convolution [75.36662947203192]
This paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dilation Network (PFDNet)
By modeling the continuous scale variations, the proposed PFDNet is able to select the proper fractional dilation kernels for adapting to different spatial locations.
It significantly improves the flexibility of the state-of-the-arts that only consider the discrete representative scales.
arXiv Detail & Related papers (2021-07-08T07:57:00Z) - PSCNet: Pyramidal Scale and Global Context Guided Network for Crowd
Counting [44.306790250158954]
This paper proposes a novel crowd counting approach based on pyramidal scale module (PSM) and global context module (GCM)
PSM is used to adaptively capture multi-scale information, which can identify a fine boundary of crowds with different image scales.
GCM is devised with low-complexity and lightweight manner, to make the interactive information across the channels of the feature maps more efficient.
arXiv Detail & Related papers (2020-12-07T11:35:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.