CrowdFormer: Weakly-supervised Crowd counting with Improved
Generalizability
- URL: http://arxiv.org/abs/2203.03768v1
- Date: Mon, 7 Mar 2022 23:10:40 GMT
- Title: CrowdFormer: Weakly-supervised Crowd counting with Improved
Generalizability
- Authors: Siddharth Singh Savner, Vivek Kanhangad
- Abstract summary: We propose a weakly-supervised method for crowd counting using a pyramid vision transformer.
Our method is comparable to the state-of-the-art on the benchmark crowd datasets.
- Score: 2.8174125805742416
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convolutional neural networks (CNNs) have dominated the field of computer
vision for nearly a decade due to their strong ability to learn local features.
However, due to their limited receptive field, CNNs fail to model the global
context. On the other hand, transformer, an attention-based architecture can
model the global context easily. Despite this, there are limited studies that
investigate the effectiveness of transformers in crowd counting. In addition,
the majority of the existing crowd counting methods are based on the regression
of density maps which requires point-level annotation of each person present in
the scene. This annotation task is laborious and also error-prone. This has led
to increased focus on weakly-supervised crowd counting methods which require
only the count-level annotations. In this paper, we propose a weakly-supervised
method for crowd counting using a pyramid vision transformer. We have conducted
extensive evaluations to validate the effectiveness of the proposed method. Our
method is comparable to the state-of-the-art on the benchmark crowd datasets.
More importantly, it shows remarkable generalizability.
Related papers
- Robust Zero-Shot Crowd Counting and Localization With Adaptive Resolution SAM [55.93697196726016]
We propose a simple yet effective crowd counting method by utilizing the Segment-Everything-Everywhere Model (SEEM)
We show that SEEM's performance in dense crowd scenes is limited, primarily due to the omission of many persons in high-density areas.
Our proposed method achieves the best unsupervised performance in crowd counting, while also being comparable to some supervised methods.
arXiv Detail & Related papers (2024-02-27T13:55:17Z) - Joint CNN and Transformer Network via weakly supervised Learning for
efficient crowd counting [22.040942519355628]
We propose a Joint CNN and Transformer Network (JCTNet) via weakly supervised learning for crowd counting.
JCTNet can effectively focus on the crowd regions and obtain superior weakly supervised counting performance on five mainstream datasets.
arXiv Detail & Related papers (2022-03-12T09:40:29Z) - Boosting Crowd Counting via Multifaceted Attention [109.89185492364386]
Large-scale variations often exist within crowd images.
Neither fixed-size convolution kernel of CNN nor fixed-size attention of recent vision transformers can handle this kind of variation.
We propose a Multifaceted Attention Network (MAN) to improve transformer models in local spatial relation encoding.
arXiv Detail & Related papers (2022-03-05T01:36:43Z) - Reinforcing Local Feature Representation for Weakly-Supervised Dense
Crowd Counting [21.26385035473938]
We propose a self-adaptive feature similarity learning network and a global-local consistency loss to reinforce local representation.
Our proposed method based on different backbones narrows the gap between weakly-supervised and fully-supervised dense crowd counting.
arXiv Detail & Related papers (2022-02-22T05:53:51Z) - Scene-Adaptive Attention Network for Crowd Counting [31.29858034122248]
This paper proposes a scene-adaptive attention network, termed SAANet.
We design a deformable attention in-built Transformer backbone, which learns adaptive feature representations with deformable sampling locations and dynamic attention weights.
We conduct extensive experiments on four challenging crowd counting benchmarks, demonstrating that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2021-12-31T15:03:17Z) - CCTrans: Simplifying and Improving Crowd Counting with Transformer [7.597392692171026]
We propose a simple approach called CCTrans to simplify the design pipeline.
Specifically, we utilize a pyramid vision transformer backbone to capture the global crowd information.
Our method achieves new state-of-the-art results on several benchmarks both in weakly and fully-supervised crowd counting.
arXiv Detail & Related papers (2021-09-29T15:13:10Z) - TransCrowd: Weakly-Supervised Crowd Counting with Transformer [56.84516562735186]
We propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on Transformer.
Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods.
arXiv Detail & Related papers (2021-04-19T08:12:50Z) - Completely Self-Supervised Crowd Counting via Distribution Matching [92.09218454377395]
We propose a complete self-supervision approach to training models for dense crowd counting.
The only input required to train, apart from a large set of unlabeled crowd images, is the approximate upper limit of the crowd count.
Our method dwells on the idea that natural crowds follow a power law distribution, which could be leveraged to yield error signals for backpropagation.
arXiv Detail & Related papers (2020-09-14T13:20:12Z) - Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks [50.78037828213118]
This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning.
We propose a novel semi-supervised crowd counting method which is built upon two innovative components.
arXiv Detail & Related papers (2020-07-07T05:30:53Z) - Towards Using Count-level Weak Supervision for Crowd Counting [55.58468947486247]
This paper studies the problem of weakly-supervised crowd counting which learns a model from only a small amount of location-level annotations (fully-supervised) but a large amount of count-level annotations (weakly-supervised)
We devise a simple-yet-effective training strategy, namely Multiple Auxiliary Tasks Training (MATT), to construct regularizes for restricting the freedom of the generated density maps.
arXiv Detail & Related papers (2020-02-29T02:58:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.