Crowd Counting via Perspective-Guided Fractional-Dilation Convolution
- URL: http://arxiv.org/abs/2107.03665v1
- Date: Thu, 8 Jul 2021 07:57:00 GMT
- Title: Crowd Counting via Perspective-Guided Fractional-Dilation Convolution
- Authors: Zhaoyi Yan, Ruimao Zhang, Hongzhi Zhang, Qingfu Zhang, and Wangmeng
Zuo
- Abstract summary: This paper proposes a novel convolution neural network-based crowd counting method, termed Perspective-guided Fractional-Dilation Network (PFDNet)
By modeling the continuous scale variations, the proposed PFDNet is able to select the proper fractional dilation kernels for adapting to different spatial locations.
It significantly improves the flexibility of the state-of-the-arts that only consider the discrete representative scales.
- Score: 75.36662947203192
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Crowd counting is critical for numerous video surveillance scenarios. One of
the main issues in this task is how to handle the dramatic scale variations of
pedestrians caused by the perspective effect. To address this issue, this paper
proposes a novel convolution neural network-based crowd counting method, termed
Perspective-guided Fractional-Dilation Network (PFDNet). By modeling the
continuous scale variations, the proposed PFDNet is able to select the proper
fractional dilation kernels for adapting to different spatial locations. It
significantly improves the flexibility of the state-of-the-arts that only
consider the discrete representative scales. In addition, by avoiding the
multi-scale or multi-column architecture that used in other methods, it is
computationally more efficient. In practice, the proposed PFDNet is constructed
by stacking multiple Perspective-guided Fractional-Dilation Convolutions (PFC)
on a VGG16-BN backbone. By introducing a novel generalized dilation convolution
operation, the PFC can handle fractional dilation ratios in the spatial domain
under the guidance of perspective annotations, achieving continuous scales
modeling of pedestrians. To deal with the problem of unavailable perspective
information in some cases, we further introduce an effective perspective
estimation branch to the proposed PFDNet, which can be trained in either
supervised or weakly-supervised setting once the branch has been pre-trained.
Extensive experiments show that the proposed PFDNet outperforms
state-of-the-art methods on ShanghaiTech A, ShanghaiTech B, WorldExpo'10,
UCF-QNRF, UCF_CC_50 and TRANCOS dataset, achieving MAE 53.8, 6.5, 6.8, 84.3,
205.8, and 3.06 respectively.
Related papers
- One-Shot Federated Learning with Bayesian Pseudocoresets [19.53527340816458]
We show that distributed function-space inference is tightly related to learning Bayesian pseudocoresets.
We show that this approach achieves prediction performance competitive to state-of-the-art while showing a striking reduction in communication cost of up to two orders of magnitude.
arXiv Detail & Related papers (2024-06-04T10:14:39Z) - Adapting to Length Shift: FlexiLength Network for Trajectory Prediction [53.637837706712794]
Trajectory prediction plays an important role in various applications, including autonomous driving, robotics, and scene understanding.
Existing approaches mainly focus on developing compact neural networks to increase prediction precision on public datasets, typically employing a standardized input duration.
We introduce a general and effective framework, the FlexiLength Network (FLN), to enhance the robustness of existing trajectory prediction against varying observation periods.
arXiv Detail & Related papers (2024-03-31T17:18:57Z) - Diffusion-based Data Augmentation for Object Counting Problems [62.63346162144445]
We develop a pipeline that utilizes a diffusion model to generate extensive training data.
We are the first to generate images conditioned on a location dot map with a diffusion model.
Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
arXiv Detail & Related papers (2024-01-25T07:28:22Z) - CFDP: Common Frequency Domain Pruning [0.3021678014343889]
We introduce a novel end-to-end pipeline for model pruning via the frequency domain.
We have achieved state-of-the-art results on CIFAR-10 with GoogLeNet reaching an accuracy of 95.25%, that is, +0.2% from the original model.
In addition to notable performances, models produced via CFDP exhibit robustness to a variety of configurations.
arXiv Detail & Related papers (2023-06-07T04:49:26Z) - DDP: Diffusion Model for Dense Visual Prediction [71.55770562024782]
We propose a simple, efficient, yet powerful framework for dense visual predictions based on the conditional diffusion pipeline.
The method, called DDP, efficiently extends the denoising diffusion process into the modern perception pipeline.
DDP shows attractive properties such as dynamic inference and uncertainty awareness, in contrast to previous single-step discriminative methods.
arXiv Detail & Related papers (2023-03-30T17:26:50Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - Contextual Pyramid Attention Network for Building Segmentation in Aerial
Imagery [12.241693880896348]
Building extraction from aerial images has several applications in problems such as urban planning, change detection, and disaster management.
We propose to improve building segmentation of different sizes by capturing long-range dependencies using contextual pyramid attention (CPA)
Our method improves 1.8 points over current state-of-the-art methods and 12.6 points higher than existing baselines without any post-processing.
arXiv Detail & Related papers (2020-04-15T11:36:26Z) - Deep Semantic Matching with Foreground Detection and Cycle-Consistency [103.22976097225457]
We address weakly supervised semantic matching based on a deep network.
We explicitly estimate the foreground regions to suppress the effect of background clutter.
We develop cycle-consistent losses to enforce the predicted transformations across multiple images to be geometrically plausible and consistent.
arXiv Detail & Related papers (2020-03-31T22:38:09Z) - Correspondence Networks with Adaptive Neighbourhood Consensus [22.013820169455812]
We propose a convolutional neural network architecture, called adaptive neighbourhood consensus network (ANC-Net)
ANC-Net can be trained end-to-end with sparse key-point annotations to handle this challenge.
We thoroughly evaluate the effectiveness of our method on various benchmarks, where it substantially outperforms state-of-the-art methods.
arXiv Detail & Related papers (2020-03-26T17:58:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.