TransCrowd: Weakly-Supervised Crowd Counting with Transformer
- URL: http://arxiv.org/abs/2104.09116v1
- Date: Mon, 19 Apr 2021 08:12:50 GMT
- Title: TransCrowd: Weakly-Supervised Crowd Counting with Transformer
- Authors: Dingkang Liang, Xiwu Chen, Wei Xu, Yu Zhou, Xiang Bai
- Abstract summary: We propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on Transformer.
Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods.
- Score: 56.84516562735186
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The mainstream crowd counting methods usually utilize the convolution neural
network (CNN) to regress a density map, requiring point-level annotations.
However, annotating each person with a point is an expensive and laborious
process. During the testing phase, the point-level annotations are not
considered to evaluate the counting accuracy, which means the point-level
annotations are redundant. Hence, it is desirable to develop weakly-supervised
counting methods that just rely on count level annotations, a more economical
way of labeling. Current weakly-supervised counting methods adopt the CNN to
regress a total count of the crowd by an image-to-count paradigm. However,
having limited receptive fields for context modeling is an intrinsic limitation
of these weakly-supervised CNN-based methods. These methods thus can not
achieve satisfactory performance, limited applications in the real-word. The
Transformer is a popular sequence-to-sequence prediction model in NLP, which
contains a global receptive field. In this paper, we propose TransCrowd, which
reformulates the weakly-supervised crowd counting problem from the perspective
of sequence-to-count based on Transformer. We observe that the proposed
TransCrowd can effectively extract the semantic crowd information by using the
self-attention mechanism of Transformer. To the best of our knowledge, this is
the first work to adopt a pure Transformer for crowd counting research.
Experiments on five benchmark datasets demonstrate that the proposed TransCrowd
achieves superior performance compared with all the weakly-supervised CNN-based
counting methods and gains highly competitive counting performance compared
with some popular fully-supervised counting methods. Code is available at
https://github.com/dk-liang/TransCrowd.
Related papers
- ClusTR: Exploring Efficient Self-attention via Clustering for Vision
Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention.
Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count.
The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z) - Joint CNN and Transformer Network via weakly supervised Learning for
efficient crowd counting [22.040942519355628]
We propose a Joint CNN and Transformer Network (JCTNet) via weakly supervised learning for crowd counting.
JCTNet can effectively focus on the crowd regions and obtain superior weakly supervised counting performance on five mainstream datasets.
arXiv Detail & Related papers (2022-03-12T09:40:29Z) - CrowdFormer: Weakly-supervised Crowd counting with Improved
Generalizability [2.8174125805742416]
We propose a weakly-supervised method for crowd counting using a pyramid vision transformer.
Our method is comparable to the state-of-the-art on the benchmark crowd datasets.
arXiv Detail & Related papers (2022-03-07T23:10:40Z) - CCTrans: Simplifying and Improving Crowd Counting with Transformer [7.597392692171026]
We propose a simple approach called CCTrans to simplify the design pipeline.
Specifically, we utilize a pyramid vision transformer backbone to capture the global crowd information.
Our method achieves new state-of-the-art results on several benchmarks both in weakly and fully-supervised crowd counting.
arXiv Detail & Related papers (2021-09-29T15:13:10Z) - Wisdom of (Binned) Crowds: A Bayesian Stratification Paradigm for Crowd
Counting [16.09823718637455]
We analyze the performance of crowd counting approaches across standard datasets at per strata level and in aggregate.
Our contributions represent a nuanced, statistically balanced and fine-grained characterization of performance for crowd counting approaches.
arXiv Detail & Related papers (2021-08-19T16:50:31Z) - HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance.
Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z) - Completely Self-Supervised Crowd Counting via Distribution Matching [92.09218454377395]
We propose a complete self-supervision approach to training models for dense crowd counting.
The only input required to train, apart from a large set of unlabeled crowd images, is the approximate upper limit of the crowd count.
Our method dwells on the idea that natural crowds follow a power law distribution, which could be leveraged to yield error signals for backpropagation.
arXiv Detail & Related papers (2020-09-14T13:20:12Z) - Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks [50.78037828213118]
This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning.
We propose a novel semi-supervised crowd counting method which is built upon two innovative components.
arXiv Detail & Related papers (2020-07-07T05:30:53Z) - Funnel-Transformer: Filtering out Sequential Redundancy for Efficient
Language Processing [112.2208052057002]
We propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one.
With comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks.
arXiv Detail & Related papers (2020-06-05T05:16:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.