Learning Discriminative Features for Crowd Counting
- URL: http://arxiv.org/abs/2311.04509v2
- Date: Tue, 18 Jun 2024 01:50:34 GMT
- Title: Learning Discriminative Features for Crowd Counting
- Authors: Yuehai Chen, Qingzhong Wang, Jing Yang, Badong Chen, Haoyi Xiong, Shaoyi Du,
- Abstract summary: We propose a learning discriminative features framework for crowd counting.
The framework is composed of a masked feature prediction module and a supervised pixel-level contrastive learning module.
The proposed modules can be beneficial in various computer vision tasks, such as crowd counting and object detection.
- Score: 46.80091947443114
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Crowd counting models in highly congested areas confront two main challenges: weak localization ability and difficulty in differentiating between foreground and background, leading to inaccurate estimations. The reason is that objects in highly congested areas are normally small and high level features extracted by convolutional neural networks are less discriminative to represent small objects. To address these problems, we propose a learning discriminative features framework for crowd counting, which is composed of a masked feature prediction module (MPM) and a supervised pixel-level contrastive learning module (CLM). The MPM randomly masks feature vectors in the feature map and then reconstructs them, allowing the model to learn about what is present in the masked regions and improving the model's ability to localize objects in high density regions. The CLM pulls targets close to each other and pushes them far away from background in the feature space, enabling the model to discriminate foreground objects from background. Additionally, the proposed modules can be beneficial in various computer vision tasks, such as crowd counting and object detection, where dense scenes or cluttered environments pose challenges to accurate localization. The proposed two modules are plug-and-play, incorporating the proposed modules into existing models can potentially boost their performance in these scenarios.
Related papers
- Locally Grouped and Scale-Guided Attention for Dense Pest Counting [1.9580473532948401]
This study introduces a new dense pest counting problem to predict densely distributed pests captured by digital traps.
To address these problems, it is essential to incorporate the local attention mechanism.
This study presents a novel design that integrates locally grouped and scale-guided attention into a multiscale CenterNet framework.
arXiv Detail & Related papers (2024-08-29T13:02:01Z) - Mixture-of-Noises Enhanced Forgery-Aware Predictor for Multi-Face Manipulation Detection and Localization [52.87635234206178]
This paper proposes a new framework, namely MoNFAP, specifically tailored for multi-face manipulation detection and localization.
The framework incorporates two novel modules: the Forgery-aware Unified Predictor (FUP) Module and the Mixture-of-Noises Module (MNM)
arXiv Detail & Related papers (2024-08-05T08:35:59Z) - SIGMA:Sinkhorn-Guided Masked Video Modeling [69.31715194419091]
Sinkhorn-guided Masked Video Modelling ( SIGMA) is a novel video pretraining method.
We distribute features of space-time tubes evenly across a limited number of learnable clusters.
Experimental results on ten datasets validate the effectiveness of SIGMA in learning more performant, temporally-aware, and robust video representations.
arXiv Detail & Related papers (2024-07-22T08:04:09Z) - PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs [55.8550939439138]
Vision-Language Models (VLMs) have shown immense potential by integrating large language models with vision systems.
These models face challenges in the fundamental computer vision task of object localisation, due to their training on multimodal data containing mostly captions.
We introduce an input-agnostic Positional Insert (PIN), a learnable spatial prompt, containing a minimal set of parameters that are slid inside the frozen VLM.
Our PIN module is trained with a simple next-token prediction task on synthetic data without requiring the introduction of new output heads.
arXiv Detail & Related papers (2024-02-13T18:39:18Z) - A bioinspired three-stage model for camouflaged object detection [8.11866601771984]
We propose a three-stage model that enables coarse-to-fine segmentation in a single iteration.
Our model employs three decoders to sequentially process subsampled features, cropped features, and high-resolution original features.
Our network surpasses state-of-the-art CNN-based counterparts without unnecessary complexities.
arXiv Detail & Related papers (2023-05-22T02:01:48Z) - Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo
Labeling and Multi-scale Feature Grouping [40.07070188661184]
Weakly-Supervised Concealed Object (WSCOS) aims to segment objects well blended with surrounding environments.
It is hard to distinguish concealed objects from the background due to the intrinsic similarity.
We propose a new WSCOS method to address these two challenges.
arXiv Detail & Related papers (2023-05-18T14:31:34Z) - DuAT: Dual-Aggregation Transformer Network for Medical Image
Segmentation [21.717520350930705]
Transformer-based models have been widely demonstrated to be successful in computer vision tasks.
However, they are often dominated by features of large patterns leading to the loss of local details.
We propose a Dual-Aggregation Transformer Network called DuAT, which is characterized by two innovative designs.
Our proposed model outperforms state-of-the-art methods in the segmentation of skin lesion images, and polyps in colonoscopy images.
arXiv Detail & Related papers (2022-12-21T07:54:02Z) - DQnet: Cross-Model Detail Querying for Camouflaged Object Detection [54.82390534024954]
A convolutional neural network (CNN) for camouflaged object detection tends to activate local discriminative regions while ignoring complete object extent.
In this paper, we argue that partial activation is caused by the intrinsic characteristics of CNN.
In order to obtain feature maps that could activate full object extent, a novel framework termed Cross-Model Detail Querying network (DQnet) is proposed.
arXiv Detail & Related papers (2022-12-16T06:23:58Z) - AF$_2$: Adaptive Focus Framework for Aerial Imagery Segmentation [86.44683367028914]
Aerial imagery segmentation has some unique challenges, the most critical one among which lies in foreground-background imbalance.
We propose Adaptive Focus Framework (AF$), which adopts a hierarchical segmentation procedure and focuses on adaptively utilizing multi-scale representations.
AF$ has significantly improved the accuracy on three widely used aerial benchmarks, as fast as the mainstream method.
arXiv Detail & Related papers (2022-02-18T10:14:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.