The Effectiveness of a Simplified Model Structure for Crowd Counting
- URL: http://arxiv.org/abs/2404.07847v3
- Date: Tue, 18 Jun 2024 13:16:55 GMT
- Title: The Effectiveness of a Simplified Model Structure for Crowd Counting
- Authors: Lei Chen, Xinghang Gao, Fei Chao, Xiang Chang, Chih Min Lin, Xingen Gao, Shaopeng Lin, Hongyi Zhang, Juqiang Lin,
- Abstract summary: This paper discusses how to construct high-performance crowd counting models using only simple structures.
We propose the Fuss-Free Network (FFNet) that is characterized by its simple and efficieny structure, consisting of only a backbone network and a multi-scale feature fusion structure.
Our proposed crowd counting model is trained and evaluated on four widely used public datasets, and it achieves accuracy that is comparable to that of existing complex models.
- Score: 11.640020969258101
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the field of crowd counting research, many recent deep learning based methods have demonstrated robust capabilities for accurately estimating crowd sizes. However, the enhancement in their performance often arises from an increase in the complexity of the model structure. This paper discusses how to construct high-performance crowd counting models using only simple structures. We proposes the Fuss-Free Network (FFNet) that is characterized by its simple and efficieny structure, consisting of only a backbone network and a multi-scale feature fusion structure. The multi-scale feature fusion structure is a simple structure consisting of three branches, each only equipped with a focus transition module, and combines the features from these branches through the concatenation operation. Our proposed crowd counting model is trained and evaluated on four widely used public datasets, and it achieves accuracy that is comparable to that of existing complex models. Furthermore, we conduct a comprehensive evaluation by replacing the existing backbones of various models such as FFNet and CCTrans with different networks, including MobileNet-v3, ConvNeXt-Tiny, and Swin-Transformer-Small. The experimental results further indicate that excellent crowd counting performance can be achieved with the simplied structure proposed by us.
Related papers
- MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval [73.77101139365912]
We propose MUSE, a multi-scale mamba with linear computational complexity for efficient cross-resolution modeling.
Specifically, the multi-scale representations are generated by applying a feature pyramid on the last single-scale feature map.
We employ the Mamba structure as an efficient multi-scale learner to jointly learn scale-wise representations.
arXiv Detail & Related papers (2024-08-20T06:30:37Z) - ComboStoc: Combinatorial Stochasticity for Diffusion Generative Models [65.82630283336051]
We show that the space spanned by the combination of dimensions and attributes is insufficiently sampled by existing training scheme of diffusion generative models.
We present a simple fix to this problem by constructing processes that fully exploit the structures, hence the name ComboStoc.
arXiv Detail & Related papers (2024-05-22T15:23:10Z) - Unified Multi-modal Unsupervised Representation Learning for
Skeleton-based Action Understanding [62.70450216120704]
Unsupervised pre-training has shown great success in skeleton-based action understanding.
We propose a Unified Multimodal Unsupervised Representation Learning framework, called UmURL.
UmURL exploits an efficient early-fusion strategy to jointly encode the multi-modal features in a single-stream manner.
arXiv Detail & Related papers (2023-11-06T13:56:57Z) - Modular Blended Attention Network for Video Question Answering [1.131316248570352]
We present an approach to facilitate the question with a reusable and composable neural unit.
We have conducted experiments on three commonly used datasets.
arXiv Detail & Related papers (2023-11-02T14:22:17Z) - Unified Embedding: Battle-Tested Feature Representations for Web-Scale
ML Systems [29.53535556926066]
Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems.
This work introduces a simple yet highly effective framework, Feature Multiplexing, where one single representation space is used across many different categorical features.
We propose a highly practical approach called Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware.
arXiv Detail & Related papers (2023-05-20T05:35:40Z) - StrAE: Autoencoding for Pre-Trained Embeddings using Explicit Structure [5.2869308707704255]
StrAE is a Structured Autoencoder framework that through strict adherence to explicit structure, enables effective learning of multi-level representations.
We show that our results are directly attributable to the informativeness of the structure provided as input, and show that this is not the case for existing tree models.
We then extend StrAE to allow the model to define its own compositions using a simple localised-merge algorithm.
arXiv Detail & Related papers (2023-05-09T16:20:48Z) - Discrete Latent Structure in Neural Networks [32.41642110537956]
This text explores three broad strategies for learning with discrete latent structure.
We show how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.
arXiv Detail & Related papers (2023-01-18T12:30:44Z) - Part-Based Models Improve Adversarial Robustness [57.699029966800644]
We show that combining human prior knowledge with end-to-end learning can improve the robustness of deep neural networks.
Our model combines a part segmentation model with a tiny classifier and is trained end-to-end to simultaneously segment objects into parts.
Our experiments indicate that these models also reduce texture bias and yield better robustness against common corruptions and spurious correlations.
arXiv Detail & Related papers (2022-09-15T15:41:47Z) - Learning Prototype-oriented Set Representations for Meta-Learning [85.19407183975802]
Learning from set-structured data is a fundamental problem that has recently attracted increasing attention.
This paper provides a novel optimal transport based way to improve existing summary networks.
We further instantiate it to the cases of few-shot classification and implicit meta generative modeling.
arXiv Detail & Related papers (2021-10-18T09:49:05Z) - SetVAE: Learning Hierarchical Composition for Generative Modeling of
Set-Structured Data [27.274328701618]
We propose SetVAE, a hierarchical variational autoencoder for sets.
Motivated by recent progress in set encoding, we build SetVAE upon attentive modules that first partition the set and project the partition back to the original cardinality.
We demonstrate that our model generalizes to unseen set sizes and learns interesting subset relations without supervision.
arXiv Detail & Related papers (2021-03-29T14:01:18Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.