Related papers: TransCrowd: Weakly-Supervised Crowd Counting with Transformer

TransCrowd: Weakly-Supervised Crowd Counting with Transformer

URL: http://arxiv.org/abs/2104.09116v1
Date: Mon, 19 Apr 2021 08:12:50 GMT
Title: TransCrowd: Weakly-Supervised Crowd Counting with Transformer
Authors: Dingkang Liang, Xiwu Chen, Wei Xu, Yu Zhou, Xiang Bai
Abstract summary: We propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on Transformer. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods.
Score: 56.84516562735186
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The mainstream crowd counting methods usually utilize the convolution neural network (CNN) to regress a density map, requiring point-level annotations. However, annotating each person with a point is an expensive and laborious process. During the testing phase, the point-level annotations are not considered to evaluate the counting accuracy, which means the point-level annotations are redundant. Hence, it is desirable to develop weakly-supervised counting methods that just rely on count level annotations, a more economical way of labeling. Current weakly-supervised counting methods adopt the CNN to regress a total count of the crowd by an image-to-count paradigm. However, having limited receptive fields for context modeling is an intrinsic limitation of these weakly-supervised CNN-based methods. These methods thus can not achieve satisfactory performance, limited applications in the real-word. The Transformer is a popular sequence-to-sequence prediction model in NLP, which contains a global receptive field. In this paper, we propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on Transformer. We observe that the proposed TransCrowd can effectively extract the semantic crowd information by using the self-attention mechanism of Transformer. To the best of our knowledge, this is the first work to adopt a pure Transformer for crowd counting research. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods and gains highly competitive counting performance compared with some popular fully-supervised counting methods. Code is available at https://github.com/dk-liang/TransCrowd.

Related papers

ClusTR: Exploring Efficient Self-attention via Clustering for Vision Transformers [70.76313507550684]
We propose a content-based sparse attention method, as an alternative to dense self-attention. Specifically, we cluster and then aggregate key and value tokens, as a content-based method of reducing the total token count. The resulting clustered-token sequence retains the semantic diversity of the original signal, but can be processed at a lower computational cost.
arXiv Detail & Related papers (2022-08-28T04:18:27Z)
Joint CNN and Transformer Network via weakly supervised Learning for efficient crowd counting [22.040942519355628]
We propose a Joint CNN and Transformer Network (JCTNet) via weakly supervised learning for crowd counting. JCTNet can effectively focus on the crowd regions and obtain superior weakly supervised counting performance on five mainstream datasets.
arXiv Detail & Related papers (2022-03-12T09:40:29Z)
CrowdFormer: Weakly-supervised Crowd counting with Improved Generalizability [2.8174125805742416]
We propose a weakly-supervised method for crowd counting using a pyramid vision transformer. Our method is comparable to the state-of-the-art on the benchmark crowd datasets.
arXiv Detail & Related papers (2022-03-07T23:10:40Z)
CCTrans: Simplifying and Improving Crowd Counting with Transformer [7.597392692171026]
We propose a simple approach called CCTrans to simplify the design pipeline. Specifically, we utilize a pyramid vision transformer backbone to capture the global crowd information. Our method achieves new state-of-the-art results on several benchmarks both in weakly and fully-supervised crowd counting.
arXiv Detail & Related papers (2021-09-29T15:13:10Z)
Wisdom of (Binned) Crowds: A Bayesian Stratification Paradigm for Crowd Counting [16.09823718637455]
We analyze the performance of crowd counting approaches across standard datasets at per strata level and in aggregate. Our contributions represent a nuanced, statistically balanced and fine-grained characterization of performance for crowd counting approaches.
arXiv Detail & Related papers (2021-08-19T16:50:31Z)
HAT: Hierarchical Aggregation Transformers for Person Re-identification [87.02828084991062]
We take advantages of both CNNs and Transformers for image-based person Re-ID with high performance. Work is the first to take advantages of both CNNs and Transformers for image-based person Re-ID.
arXiv Detail & Related papers (2021-07-13T09:34:54Z)
Completely Self-Supervised Crowd Counting via Distribution Matching [92.09218454377395]
We propose a complete self-supervision approach to training models for dense crowd counting. The only input required to train, apart from a large set of unlabeled crowd images, is the approximate upper limit of the crowd count. Our method dwells on the idea that natural crowds follow a power law distribution, which could be leveraged to yield error signals for backpropagation.
arXiv Detail & Related papers (2020-09-14T13:20:12Z)
Semi-Supervised Crowd Counting via Self-Training on Surrogate Tasks [50.78037828213118]
This paper tackles the semi-supervised crowd counting problem from the perspective of feature learning. We propose a novel semi-supervised crowd counting method which is built upon two innovative components.
arXiv Detail & Related papers (2020-07-07T05:30:53Z)
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing [112.2208052057002]
We propose Funnel-Transformer which gradually compresses the sequence of hidden states to a shorter one. With comparable or fewer FLOPs, Funnel-Transformer outperforms the standard Transformer on a wide variety of sequence-level prediction tasks.
arXiv Detail & Related papers (2020-06-05T05:16:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.