One-Shot Crowd Counting With Density Guidance For Scene Adaptaion
- URL: http://arxiv.org/abs/2602.07955v1
- Date: Sun, 08 Feb 2026 12:58:47 GMT
- Title: One-Shot Crowd Counting With Density Guidance For Scene Adaptaion
- Authors: Jiwei Chen, Qi Wang, Junyu Gao, Jing Zhang, Dingyi Li, Jing-Jia Luo,
- Abstract summary: Existing crowd models have limited generalization for unseen surveillance scenes.<n>To improve the generalization of the model, we regard different surveillance scenes as different category scenes.<n>We propose to leverage local and global density characteristics to guide the model of crowd counting for unseen surveillance scenes.
- Score: 25.032009219268875
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Crowd scenes captured by cameras at different locations vary greatly, and existing crowd models have limited generalization for unseen surveillance scenes. To improve the generalization of the model, we regard different surveillance scenes as different category scenes, and introduce few-shot learning to make the model adapt to the unseen surveillance scene that belongs to the given exemplar category scene. To this end, we propose to leverage local and global density characteristics to guide the model of crowd counting for unseen surveillance scenes. Specifically, to enable the model to adapt to the varying density variations in the target scene, we propose the multiple local density learner to learn multi prototypes which represent different density distributions in the support scene. Subsequently, these multiple local density similarity matrixes are encoded. And they are utilized to guide the model in a local way. To further adapt to the global density in the target scene, the global density features are extracted from the support image, then it is used to guide the model in a global way. Experiments on three surveillance datasets shows that proposed method can adapt to the unseen surveillance scene and outperform recent state-of-the-art methods in the few-shot crowd counting.
Related papers
- Video Individual Counting for Moving Drones [51.429771128144964]
Video Individual Counting (VIC) has received increasing attention for its importance in intelligent video surveillance.<n>Previous datasets are captured with fixed or rarely moving cameras with relatively sparse individuals, restricting evaluation for a highly varying view and time in crowded scenes.<n>To address these issues, we introduce the MovingDroneCrowd dataset, featuring videos captured by fast-moving drones in crowded scenes under diverse illuminations, shooting heights and angles.
arXiv Detail & Related papers (2025-03-12T07:09:33Z) - Bilevel Fast Scene Adaptation for Low-Light Image Enhancement [50.639332885989255]
Enhancing images in low-light scenes is a challenging but widely concerned task in the computer vision.
Main obstacle lies in the modeling conundrum from distribution discrepancy across different scenes.
We introduce the bilevel paradigm to model the above latent correspondence.
A bilevel learning framework is constructed to endow the scene-irrelevant generality of the encoder towards diverse scenes.
arXiv Detail & Related papers (2023-06-02T08:16:21Z) - CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph
Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes.
Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes.
The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z) - Learning to Fuse Monocular and Multi-view Cues for Multi-frame Depth
Estimation in Dynamic Scenes [51.20150148066458]
We propose a novel method to learn to fuse the multi-view and monocular cues encoded as volumes without needing the generalizationally crafted masks.
Experiments on real-world datasets prove the significant effectiveness and ability of the proposed method.
arXiv Detail & Related papers (2023-04-18T13:55:24Z) - PANet: Perspective-Aware Network with Dynamic Receptive Fields and
Self-Distilling Supervision for Crowd Counting [63.84828478688975]
We propose a novel perspective-aware approach called PANet to address the perspective problem.
Based on the observation that the size of the objects varies greatly in one image due to the perspective effect, we propose the dynamic receptive fields (DRF) framework.
The framework is able to adjust the receptive field by the dilated convolution parameters according to the input image, which helps the model to extract more discriminative features for each local region.
arXiv Detail & Related papers (2021-10-31T04:43:05Z) - Congested Crowd Instance Localization with Dilated Convolutional Swin
Transformer [119.72951028190586]
Crowd localization is a new computer vision task, evolved from crowd counting.
In this paper, we focus on how to achieve precise instance localization in high-density crowd scenes.
We propose a Dilated Convolutional Swin Transformer (DCST) for congested crowd scenes.
arXiv Detail & Related papers (2021-08-02T01:27:53Z) - Learning Independent Instance Maps for Crowd Localization [44.6430092887941]
We propose an end-to-end and straightforward framework for crowd localization, named Independent Instance Map segmentation (IIM)
IIM segment crowds into independent connected components, the positions and the crowd counts are obtained.
To improve the segmentation quality for different density regions, we present a differentiable Binarization Module (BM)
BM brings two advantages into localization models: 1) adaptively learn a threshold map for different images to detect each instance more accurately; 2) directly train the model using loss on binary predictions and labels.
arXiv Detail & Related papers (2020-12-08T02:17:19Z) - AdaCrowd: Unlabeled Scene Adaptation for Crowd Counting [14.916045549353987]
We propose a new problem called unlabeled scene-adaptive crowd counting.
Given a new target scene, we would like to have a crowd counting model specifically adapted to this particular scene.
In this paper, we propose to use one or more unlabeled images from the target scene to perform the adaptation.
arXiv Detail & Related papers (2020-10-23T03:20:42Z) - Few-Shot Scene Adaptive Crowd Counting Using Meta-Learning [13.149654626505741]
We consider the problem of few-shot scene adaptive crowd counting.
Given a target camera scene, our goal is to adapt a model to this specific scene with only a few labeled images of that scene.
We accomplish this challenge by taking inspiration from the recently introduced learning-to-learn paradigm in the context of few-shot regime.
arXiv Detail & Related papers (2020-02-01T19:41:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.