Deep Rank-Consistent Pyramid Model for Enhanced Crowd Counting
- URL: http://arxiv.org/abs/2201.04819v2
- Date: Wed, 22 Nov 2023 11:32:46 GMT
- Title: Deep Rank-Consistent Pyramid Model for Enhanced Crowd Counting
- Authors: Jiaqi Gao, Zhizhong Huang, Yiming Lei, Hongming Shan, James Z. Wang,
Fei-Yue Wang, Junping Zhang
- Abstract summary: We propose a Deep Rank-consistEnt pyrAmid Model (DREAM), which makes full use of rank consistency across coarse-to-fine pyramid features in latent spaces for enhanced crowd counting with massive unlabeled images.
In addition, we have collected a new unlabeled crowd counting dataset, FUDAN-UCC, comprising 4,000 images for training purposes.
- Score: 48.15210212256114
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most conventional crowd counting methods utilize a fully-supervised learning
framework to establish a mapping between scene images and crowd density maps.
They usually rely on a large quantity of costly and time-intensive pixel-level
annotations for training supervision. One way to mitigate the intensive
labeling effort and improve counting accuracy is to leverage large amounts of
unlabeled images. This is attributed to the inherent self-structural
information and rank consistency within a single image, offering additional
qualitative relation supervision during training. Contrary to earlier methods
that utilized the rank relations at the original image level, we explore such
rank-consistency relation within the latent feature spaces. This approach
enables the incorporation of numerous pyramid partial orders, strengthening the
model representation capability. A notable advantage is that it can also
increase the utilization ratio of unlabeled samples. Specifically, we propose a
Deep Rank-consistEnt pyrAmid Model (DREAM), which makes full use of rank
consistency across coarse-to-fine pyramid features in latent spaces for
enhanced crowd counting with massive unlabeled images. In addition, we have
collected a new unlabeled crowd counting dataset, FUDAN-UCC, comprising 4,000
images for training purposes. Extensive experiments on four benchmark datasets,
namely UCF-QNRF, ShanghaiTech PartA and PartB, and UCF-CC-50, show the
effectiveness of our method compared with previous semi-supervised methods. The
codes are available at https://github.com/bridgeqiqi/DREAM.
Related papers
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal
LLMs [49.88461345825586]
This paper proposes a new framework to enhance the fine-grained image understanding abilities of MLLMs.
We present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.
We show that our model exhibits a 5.2% accuracy improvement over Qwen-VL and surpasses the accuracy of Kosmos-2 by 24.7%.
arXiv Detail & Related papers (2023-10-01T05:53:15Z) - Scale Attention for Learning Deep Face Representation: A Study Against
Visual Scale Variation [69.45176408639483]
We reform the conv layer by resorting to the scale-space theory.
We build a novel style named SCale AttentioN Conv Neural Network (textbfSCAN-CNN)
As a single-shot scheme, the inference is more efficient than multi-shot fusion.
arXiv Detail & Related papers (2022-09-19T06:35:04Z) - Glance to Count: Learning to Rank with Anchors for Weakly-supervised
Crowd Counting [43.446730359817515]
Crowd image is arguably one of the most laborious data to annotate.
We propose a novel weakly-supervised setting, in which we leverage the binary ranking of two images with high-contrast crowd counts as training guidance.
We conduct extensive experiments to study various combinations of supervision, and we show that the proposed method outperforms existing weakly-supervised methods by a large margin.
arXiv Detail & Related papers (2022-05-29T13:39:34Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Spatial Consistency Loss for Training Multi-Label Classifiers from
Single-Label Annotations [39.69823105183408]
Multi-label classification is more applicable "in the wild" than single-label classification.
We show that adding a consistency loss is a simple yet effective method to train multi-label classifiers in a weakly supervised setting.
We also demonstrate improved multi-label classification mAP on ImageNet-1K using the ReaL multi-label validation set.
arXiv Detail & Related papers (2022-03-11T17:54:20Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Multi-Scale Context Aggregation Network with Attention-Guided for Crowd
Counting [23.336181341124746]
Crowd counting aims to predict the number of people and generate the density map in the image.
There are many challenges, including varying head scales, the diversity of crowd distribution across images and cluttered backgrounds.
We propose a multi-scale context aggregation network (MSCANet) based on single-column encoder-decoder architecture for crowd counting.
arXiv Detail & Related papers (2021-04-06T02:24:06Z) - Completely Self-Supervised Crowd Counting via Distribution Matching [92.09218454377395]
We propose a complete self-supervision approach to training models for dense crowd counting.
The only input required to train, apart from a large set of unlabeled crowd images, is the approximate upper limit of the crowd count.
Our method dwells on the idea that natural crowds follow a power law distribution, which could be leveraged to yield error signals for backpropagation.
arXiv Detail & Related papers (2020-09-14T13:20:12Z) - Active Crowd Counting with Limited Supervision [13.09054893296829]
We present an active learning framework which enables accurate crowd counting with limited supervision.
We first introduce an active labeling strategy to annotate the most informative images in the dataset and learn the counting model upon them.
In the last cycle when the labeling budget is met, the large amount of unlabeled data are also utilized.
arXiv Detail & Related papers (2020-07-13T12:07:25Z) - Towards Reading Beyond Faces for Sparsity-Aware 4D Affect Recognition [55.15661254072032]
We present a sparsity-aware deep network for automatic 4D facial expression recognition (FER)
We first propose a novel augmentation method to combat the data limitation problem for deep learning.
We then present a sparsity-aware deep network to compute the sparse representations of convolutional features over multi-views.
arXiv Detail & Related papers (2020-02-08T13:09:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.