Ranking-based Adaptive Query Generation for DETRs in Crowded Pedestrian
Detection
- URL: http://arxiv.org/abs/2310.15725v2
- Date: Mon, 8 Jan 2024 13:36:14 GMT
- Title: Ranking-based Adaptive Query Generation for DETRs in Crowded Pedestrian
Detection
- Authors: Feng Gao, Jiaxu Leng, Ji Gan, and Xinbo Gao
- Abstract summary: We find that the number of DETRs' queries must be adjusted manually, otherwise, the performance would degrade to varying degrees.
We propose Rank-based Adaptive Query Generation (RAQG) to alleviate the problem.
Our method is simple and effective, which can be plugged into any DETRs to make it query-adaptive in theory.
- Score: 49.27380156754935
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: DEtection TRansformer (DETR) and its variants (DETRs) have been successfully
applied to crowded pedestrian detection, which achieved promising performance.
However, we find that, in different degrees of crowded scenes, the number of
DETRs' queries must be adjusted manually, otherwise, the performance would
degrade to varying degrees. In this paper, we first analyze the two current
query generation methods and summarize four guidelines for designing the
adaptive query generation method. Then, we propose Rank-based Adaptive Query
Generation (RAQG) to alleviate the problem. Specifically, we design a rank
prediction head that can predict the rank of the lowest confidence positive
training sample produced by the encoder. Based on the predicted rank, we design
an adaptive selection method that can adaptively select coarse detection
results produced by the encoder to generate queries. Moreover, to train the
rank prediction head better, we propose Soft Gradient L1 Loss. The gradient of
Soft Gradient L1 Loss is continuous, which can describe the relationship
between the loss value and the updated value of model parameters granularly.
Our method is simple and effective, which can be plugged into any DETRs to make
it query-adaptive in theory. The experimental results on Crowdhuman dataset and
Citypersons dataset show that our method can adaptively generate queries for
DETRs and achieve competitive results. Especially, our method achieves
state-of-the-art 39.4% MR on Crowdhuman dataset.
Related papers
- Estimating the Hessian Matrix of Ranking Objectives for Stochastic Learning to Rank with Gradient Boosted Trees [63.18324983384337]
We introduce the first learning to rank method for Gradient Boosted Decision Trees (GBDTs)
Our main contribution is a novel estimator for the second-order derivatives, i.e., the Hessian matrix.
We incorporate our estimator into the existing PL-Rank framework, which was originally designed for first-order derivatives only.
arXiv Detail & Related papers (2024-04-18T13:53:32Z) - Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement [19.277560848076984]
Two-stage selection strategies result in scale bias and redundancy due to mismatch between selected queries and objects.
We propose hierarchical salience filtering refinement, which performs transformer encoding only on filtered discriminative queries.
The proposed Salience DETR achieves significant improvements of +4.0% AP, +0.2% AP, +4.4% AP on three challenging task-specific detection datasets.
arXiv Detail & Related papers (2024-03-24T13:01:57Z) - Rank-DETR for High Quality Object Detection [52.82810762221516]
A highly performant object detector requires accurate ranking for the bounding box predictions.
In this work, we introduce a simple and highly performant DETR-based object detector by proposing a series of rank-oriented designs.
arXiv Detail & Related papers (2023-10-13T04:48:32Z) - Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD)
We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector.
Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z) - D2Q-DETR: Decoupling and Dynamic Queries for Oriented Object Detection
with Transformers [14.488821968433834]
We propose an end-to-end framework for oriented object detection.
Our framework is based on DETR, with the box regression head replaced with a points prediction head.
Experiments on the largest and challenging DOTA-v1.0 and DOTA-v1.5 datasets show that D2Q-DETR outperforms existing NMS-based and NMS-free oriented object detection methods.
arXiv Detail & Related papers (2023-03-01T14:36:19Z) - GROOT: Corrective Reward Optimization for Generative Sequential Labeling [10.306943706927004]
We propose GROOT -- a framework for Generative Reward Optimization Of Text sequences.
GROOT works by training a generative sequential labeling model to match the decoder output distribution with that of the (black-box) reward function.
As demonstrated via extensive experiments on four public benchmarks, GROOT significantly improves all reward metrics.
arXiv Detail & Related papers (2022-09-29T11:35:47Z) - Progressive End-to-End Object Detection in Crowded Scenes [96.92416613336096]
Previous query-based detectors suffer from two drawbacks: first, multiple predictions will be inferred for a single object, typically in crowded scenes; second, the performance saturates as the depth of the decoding stage increases.
We propose a progressive predicting method to address the above issues. Specifically, we first select accepted queries to generate true positive predictions, then refine the rest noisy queries according to the previously accepted predictions.
Experiments show that our method can significantly boost the performance of query-based detectors in crowded scenes.
arXiv Detail & Related papers (2022-03-15T06:12:00Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.