Density-based Object Detection in Crowded Scenes
- URL: http://arxiv.org/abs/2504.09819v1
- Date: Mon, 14 Apr 2025 02:41:49 GMT
- Title: Density-based Object Detection in Crowded Scenes
- Authors: Chenyang Zhao, Jia Wan, Antoni B. Chan,
- Abstract summary: We propose density-guided anchors (DGA) and density-guided NMS (DG-NMS)<n>DGA computes optimal anchor assignments and reweighing, as well as an adaptive NMS.<n>Experiments on the challenging CrowdHuman dataset with Citypersons dataset demonstrate that our proposed density-guided detector is effective and robust to crowdedness.
- Score: 54.037103707572136
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Compared with the generic scenes, crowded scenes contain highly-overlapped instances, which result in: 1) more ambiguous anchors during training of object detectors, and 2) more predictions are likely to be mistakenly suppressed in post-processing during inference. To address these problems, we propose two new strategies, density-guided anchors (DGA) and density-guided NMS (DG-NMS), which uses object density maps to jointly compute optimal anchor assignments and reweighing, as well as an adaptive NMS. Concretely, based on an unbalanced optimal transport (UOT) problem, the density owned by each ground-truth object is transported to each anchor position at a minimal transport cost. And density on anchors comprises an instance-specific density distribution, from which DGA decodes the optimal anchor assignment and re-weighting strategy. Meanwhile, DG-NMS utilizes the predicted density map to adaptively adjust the NMS threshold to reduce mistaken suppressions. In the UOT, a novel overlap-aware transport cost is specifically designed for ambiguous anchors caused by overlapped neighboring objects. Extensive experiments on the challenging CrowdHuman dataset with Citypersons dataset demonstrate that our proposed density-guided detector is effective and robust to crowdedness. The code and pre-trained models will be made available later.
Related papers
- DenSe-AdViT: A novel Vision Transformer for Dense SAR Object Detection [6.132395411070981]
Vision Transformer (ViT) has achieved remarkable results in object detection for synthetic aperture radar (SAR) images.
However, it struggles with the extraction of multi-scale local features, leading to limited performance in detecting small targets.
We propose Density-Sensitive Vision Transformer with Adaptive Tokens (DenSe-AdViT) for dense SAR target detection.
arXiv Detail & Related papers (2025-04-18T11:25:49Z) - Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark [52.339936954958034]
The dynamic imbalance of the fore-background is a major challenge in video object counting.
We propose a density-embedded Efficient Masked Autoencoder Counting (E-MAC) framework in this paper.
In addition, we first propose a large video bird counting dataset, DroneBird, in natural scenarios for migratory bird protection.
arXiv Detail & Related papers (2024-11-20T06:08:21Z) - Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation [51.66997548477913]
We propose a novel feature-level consistency learning framework named Density-Descending Feature Perturbation (DDFP)
Inspired by the low-density separation assumption in semi-supervised learning, our key insight is that feature density can shed a light on the most promising direction for the segmentation classifier to explore.
The proposed DDFP outperforms other designs on feature-level perturbations and shows state of the art performances on both Pascal VOC and Cityscapes dataset.
arXiv Detail & Related papers (2024-03-11T06:59:05Z) - Semantic Segmentation on 3D Point Clouds with High Density Variations [44.467561618769714]
HDVNet contains a nested set of encoder-decoder pathways, each handling a specific point density range.
By effectively handling input density variations, HDVNet outperforms state-of-the-art models in segmentation accuracy on real point clouds with inconsistent density.
arXiv Detail & Related papers (2023-07-04T05:44:13Z) - HDNet: A Hierarchically Decoupled Network for Crowd Counting [11.530565995318696]
We propose a Hierarchically Decoupled Network (HDNet) to solve the above two problems within a unified framework.
HDNet achieves state-of-the-art performance on several popular counting benchmarks.
arXiv Detail & Related papers (2022-12-12T06:01:26Z) - Semi-supervised Crowd Counting via Density Agency [57.3635501421658]
We build a learnable auxiliary structure, namely the density agency to bring the recognized foreground regional features close to corresponding density sub-classes.
Second, we propose a density-guided contrastive learning loss to consolidate the backbone feature extractor.
Third, we build a regression head by using a transformer structure to refine the foreground features further.
arXiv Detail & Related papers (2022-09-07T06:34:00Z) - Efficient LiDAR Point Cloud Geometry Compression Through Neighborhood
Point Attention [25.054578678654796]
This work suggests the neighborhood point attention (NPA) to tackle them.
We first use k nearest neighbors (kNN) to construct adaptive local neighborhood.
We then leverage the self-attention mechanism to dynamically aggregate information within this neighborhood.
arXiv Detail & Related papers (2022-08-26T10:44:30Z) - Cascaded Residual Density Network for Crowd Counting [63.714719914701014]
We propose a novel Cascaded Residual Density Network (CRDNet) in a coarse-to-fine approach to generate the high-quality density map for crowd counting more accurately.
A novel additional local count loss is presented to refine the accuracy of crowd counting.
arXiv Detail & Related papers (2021-07-29T03:07:11Z) - Tracking-by-Counting: Using Network Flows on Crowd Density Maps for
Tracking Multiple Targets [96.98888948518815]
State-of-the-art multi-object tracking(MOT) methods follow the tracking-by-detection paradigm.
We propose a new MOT paradigm, tracking-by-counting, tailored for crowded scenes.
arXiv Detail & Related papers (2020-07-18T19:51:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.