Progressive End-to-End Object Detection in Crowded Scenes
- URL: http://arxiv.org/abs/2203.07669v1
- Date: Tue, 15 Mar 2022 06:12:00 GMT
- Title: Progressive End-to-End Object Detection in Crowded Scenes
- Authors: Anlin Zheng, Yuang Zhang, Xiangyu Zhang, Xiaojuan Qi, Jian Sun
- Abstract summary: Previous query-based detectors suffer from two drawbacks: first, multiple predictions will be inferred for a single object, typically in crowded scenes; second, the performance saturates as the depth of the decoding stage increases.
We propose a progressive predicting method to address the above issues. Specifically, we first select accepted queries to generate true positive predictions, then refine the rest noisy queries according to the previously accepted predictions.
Experiments show that our method can significantly boost the performance of query-based detectors in crowded scenes.
- Score: 96.92416613336096
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a new query-based detection framework for crowd
detection. Previous query-based detectors suffer from two drawbacks: first,
multiple predictions will be inferred for a single object, typically in crowded
scenes; second, the performance saturates as the depth of the decoding stage
increases. Benefiting from the nature of the one-to-one label assignment rule,
we propose a progressive predicting method to address the above issues.
Specifically, we first select accepted queries prone to generate true positive
predictions, then refine the rest noisy queries according to the previously
accepted predictions. Experiments show that our method can significantly boost
the performance of query-based detectors in crowded scenes. Equipped with our
approach, Sparse RCNN achieves 92.0\% $\text{AP}$, 41.4\% $\text{MR}^{-2}$ and
83.2\% $\text{JI}$ on the challenging CrowdHuman \cite{shao2018crowdhuman}
dataset, outperforming the box-based method MIP \cite{chu2020detection} that
specifies in handling crowded scenarios. Moreover, the proposed method, robust
to crowdedness, can still obtain consistent improvements on moderately and
slightly crowded datasets like CityPersons \cite{zhang2017citypersons} and COCO
\cite{lin2014microsoft}. Code will be made publicly available at
https://github.com/megvii-model/Iter-E2EDET.
Related papers
- Less is More: One-shot Subgraph Reasoning on Large-scale Knowledge Graphs [49.547988001231424]
We propose the one-shot-subgraph link prediction to achieve efficient and adaptive prediction.
Design principle is that, instead of directly acting on the whole KG, the prediction procedure is decoupled into two steps.
We achieve promoted efficiency and leading performances on five large-scale benchmarks.
arXiv Detail & Related papers (2024-03-15T12:00:12Z) - Ranking-based Adaptive Query Generation for DETRs in Crowded Pedestrian
Detection [49.27380156754935]
We find that the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees.
We propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem.
Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory.
arXiv Detail & Related papers (2023-10-24T11:00:56Z) - Enhancing Hyperedge Prediction with Context-Aware Self-Supervised
Learning [64.46188414653204]
We propose a novel hyperedge prediction framework (CASH)
CASH employs context-aware node aggregation to capture complex relations among nodes in each hyperedge for (C1) and (2) self-supervised contrastive learning in the context of hyperedge prediction to enhance hypergraph representations for (C2)
Experiments on six real-world hypergraphs reveal that CASH consistently outperforms all competing methods in terms of the accuracy in hyperedge prediction.
arXiv Detail & Related papers (2023-09-11T20:06:00Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - SuctionNet-1Billion: A Large-Scale Benchmark for Suction Grasping [47.221326169627666]
We propose a new physical model to analytically evaluate seal formation and wrench resistance of a suction grasping.
A two-step methodology is adopted to generate annotations on a large-scale dataset collected in real-world cluttered scenarios.
A standard online evaluation system is proposed to evaluate suction poses in continuous operation space.
arXiv Detail & Related papers (2021-03-23T05:02:52Z) - Adaptive Bi-directional Attention: Exploring Multi-Granularity
Representations for Machine Reading Comprehension [29.717816161964105]
We propose a novel approach called Adaptive Bidirectional Attention, which adaptively exploits the source representations of different levels to the predictor.
Results are better than the previous state-of-the-art model by 2.5$%$ EM and 2.3$%$ F1 scores.
arXiv Detail & Related papers (2020-12-20T09:31:35Z) - Detection in Crowded Scenes: One Proposal, Multiple Predictions [79.28850977968833]
We propose a proposal-based object detector, aiming at detecting highly-overlapped instances in crowded scenes.
The key of our approach is to let each proposal predict a set of correlated instances rather than a single one in previous proposal-based frameworks.
Our detector can obtain 4.9% AP gains on challenging CrowdHuman dataset and 1.0% $textMR-2$ improvements on CityPersons dataset.
arXiv Detail & Related papers (2020-03-20T09:48:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.