Related papers: Counting Varying Density Crowds Through Density Guided Adaptive Selection CNN and Transformer Estimation

Counting Varying Density Crowds Through Density Guided Adaptive Selection CNN and Transformer Estimation

URL: http://arxiv.org/abs/2206.10075v1
Date: Tue, 21 Jun 2022 02:05:41 GMT
Title: Counting Varying Density Crowds Through Density Guided Adaptive Selection CNN and Transformer Estimation
Authors: Yuehai Chen, Jing Yang, Badong Chen and Shaoyi Du
Abstract summary: Human tend to locate and count the target in low-density regions, and reason the number in high-density regions. We propose a CNN and Transformer Adaptive Selection Network (CTASNet) which can adaptively select the appropriate counting branch for different density regions.
Score: 25.050801798414263
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In real-world crowd counting applications, the crowd densities in an image vary greatly. When facing with density variation, human tend to locate and count the target in low-density regions, and reason the number in high-density regions. We observe that CNN focus on the local information correlation using a fixed-size convolution kernel and the Transformer could effectively extract the semantic crowd information by using the global self-attention mechanism. Thus, CNN could locate and estimate crowd accurately in low-density regions, while it is hard to properly perceive density in high-density regions. On the contrary, Transformer, has a high reliability in high-density regions, but fails to locate the target in sparse regions. Neither CNN or Transformer can well deal with this kind of density variations. To address this problem, we propose a CNN and Transformer Adaptive Selection Network (CTASNet) which can adaptively select the appropriate counting branch for different density regions. Firstly, CTASNet generates the prediction results of CNN and Transformer. Then, considering that CNN/Transformer are appropriate for low/high-density regions, a density guided Adaptive Selection Module is designed to automatically combine the predictions of CNN and Transformer. Moreover, to reduce the influences of annotation noise, we introduce a Correntropy based Optimal Transport loss. Extensive experiments on four challenging crowd counting datasets have validated the proposed method.

Related papers

OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation [70.17681136234202]
We reexamine the design distinctions and test the limits of what a sparse CNN can achieve. We propose two key components, i.e., adaptive receptive fields (spatially) and adaptive relation, to bridge the gap. This exploration led to the creation of Omni-Adaptive 3D CNNs (OA-CNNs), a family of networks that integrates a lightweight module.
arXiv Detail & Related papers (2024-03-21T14:06:38Z)
Diffusion-based Data Augmentation for Object Counting Problems [62.63346162144445]
We develop a pipeline that utilizes a diffusion model to generate extensive training data. We are the first to generate images conditioned on a location dot map with a diffusion model. Our proposed counting loss for the diffusion model effectively minimizes the discrepancies between the location dot map and the crowd images generated.
arXiv Detail & Related papers (2024-01-25T07:28:22Z)
SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection [12.727650696327878]
We propose an end-to-end compounded dense network SwinV2DNet to inherit advantages of transformer and CNN. It captures the change relationship features through the densely connected Swin V2 backbone. It provides the low-level pre-changed and post-changed features through a CNN branch.
arXiv Detail & Related papers (2023-08-22T03:31:52Z)
Studying inductive biases in image classification task [0.0]
Self-attention (SA) structures have locally independent filters and can use large kernels, which contradicts the previously popular convolutional neural networks (CNNs) We show that context awareness was the crucial property; however, large local information was not necessary to construct CA parameters.
arXiv Detail & Related papers (2022-10-31T08:43:26Z)
Rethinking Spatial Invariance of Convolutional Networks for Object Counting [119.83017534355842]
We try to use locally connected Gaussian kernels to replace the original convolution filter to estimate the spatial position in the density map. Inspired by previous work, we propose a low-rank approximation accompanied with translation invariance to favorably implement the approximation of massive Gaussian convolution. Our methods significantly outperform other state-of-the-art methods and achieve promising learning of the spatial position of objects.
arXiv Detail & Related papers (2022-06-10T17:51:25Z)
Contextual Attention Network: Transformer Meets U-Net [0.0]
convolutional neural networks (CNN) have become the de facto standard and attained immense success in medical image segmentation. However, CNN based methods fail to build long-range dependencies and global context connections. Recent articles have exploited Transformer variants for medical image segmentation tasks.
arXiv Detail & Related papers (2022-03-02T21:10:24Z)
TransCrowd: Weakly-Supervised Crowd Counting with Transformer [56.84516562735186]
We propose TransCrowd, which reformulates the weakly-supervised crowd counting problem from the perspective of sequence-to-count based on Transformer. Experiments on five benchmark datasets demonstrate that the proposed TransCrowd achieves superior performance compared with all the weakly-supervised CNN-based counting methods.
arXiv Detail & Related papers (2021-04-19T08:12:50Z)
Uncertainty Estimation and Sample Selection for Crowd Counting [87.29137075538213]
We present a method for image-based crowd counting that can predict a crowd density map together with the uncertainty values pertaining to the predicted density map. A key advantage of our method over existing crowd counting methods is its ability to quantify the uncertainty of its predictions. We show that our sample selection strategy drastically reduces the amount of labeled data needed to adapt a counting network trained on a source domain to the target domain.
arXiv Detail & Related papers (2020-09-30T03:40:07Z)
Region Growing with Convolutional Neural Networks for Biomedical Image Segmentation [1.5469452301122177]
We present a methodology that uses convolutional neural networks (CNNs) for segmentation by iteratively growing predicted mask regions in each coordinate direction. We use a threshold on the CNN probability scores to determine whether pixels are added to the region and the iteration continues until no new pixels are added to the region. Our method is able to achieve high segmentation accuracy and preserve biologically realistic morphological features while leveraging small amounts of training data and maintaining computational efficiency.
arXiv Detail & Related papers (2020-09-23T17:53:00Z)
Approximation and Non-parametric Estimation of ResNet-type Convolutional Neural Networks [52.972605601174955]
We show a ResNet-type CNN can attain the minimax optimal error rates in important function classes. We derive approximation and estimation error rates of the aformentioned type of CNNs for the Barron and H"older classes.
arXiv Detail & Related papers (2019-03-24T19:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.