Dense Distinct Query for End-to-End Object Detection
- URL: http://arxiv.org/abs/2303.12776v2
- Date: Wed, 5 Jul 2023 13:36:43 GMT
- Title: Dense Distinct Query for End-to-End Object Detection
- Authors: Shilong Zhang, Xinjiang Wang, Jiaqi Wang, Jiangmiao Pang, Chengqi Lyu,
Wenwei Zhang, Ping Luo, Kai Chen
- Abstract summary: One-to-one assignment in object detection has successfully obviated the need for non-maximum suppression.
This paper shows that the solution should be Dense Distinct Queries (DDQ)
DDQ blends the advantages of traditional and recent end-to-end detectors and significantly improves the performance of various detectors.
- Score: 39.32011383066249
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-to-one label assignment in object detection has successfully obviated the
need for non-maximum suppression (NMS) as postprocessing and makes the pipeline
end-to-end. However, it triggers a new dilemma as the widely used sparse
queries cannot guarantee a high recall, while dense queries inevitably bring
more similar queries and encounter optimization difficulties. As both sparse
and dense queries are problematic, then what are the expected queries in
end-to-end object detection? This paper shows that the solution should be Dense
Distinct Queries (DDQ). Concretely, we first lay dense queries like traditional
detectors and then select distinct ones for one-to-one assignments. DDQ blends
the advantages of traditional and recent end-to-end detectors and significantly
improves the performance of various detectors including FCN, R-CNN, and DETRs.
Most impressively, DDQ-DETR achieves 52.1 AP on MS-COCO dataset within 12
epochs using a ResNet-50 backbone, outperforming all existing detectors in the
same setting. DDQ also shares the benefit of end-to-end detectors in crowded
scenes and achieves 93.8 AP on CrowdHuman. We hope DDQ can inspire researchers
to consider the complementarity between traditional methods and end-to-end
detectors. The source code can be found at
\url{https://github.com/jshilong/DDQ}.
Related papers
- Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images [26.37802649901314]
We propose an end-to-end oriented detector equipped with an efficient decoder.
Rotated RoI attention and Selective Distinct Queries (SDQ) are proposed.
Our method achieves state-of-the-art performance on DIOR-R (67.31% mAP), DOTA-v1.5 (67.43% mAP), and DOTA-v2.0 (53.28% mAP) with the ResNet50 backbone.
arXiv Detail & Related papers (2023-11-29T13:43:17Z) - Spatial-Temporal Graph Enhanced DETR Towards Multi-Frame 3D Object Detection [54.041049052843604]
We present STEMD, a novel end-to-end framework that enhances the DETR-like paradigm for multi-frame 3D object detection.
First, to model the inter-object spatial interaction and complex temporal dependencies, we introduce the spatial-temporal graph attention network.
Finally, it poses a challenge for the network to distinguish between the positive query and other highly similar queries that are not the best match.
arXiv Detail & Related papers (2023-07-01T13:53:14Z) - Featurized Query R-CNN [41.40318163261041]
We present featurized object queries predicted by a query generation network in the Faster R-CNN framework.
Our Featurized Query R-CNN obtains the best speed-accuracy trade-off among all R-CNN detectors, including the recent state-of-the-art Sparse R-CNN detector.
arXiv Detail & Related papers (2022-06-13T15:40:19Z) - What Are Expected Queries in End-to-End Object Detection? [28.393693394478724]
This paper shows that the expected queries should be COCO Distinct Queries (DDQ)
DDQ is stronger, more robust, and converges faster than previous methods.
It obtains 44.5 AP on the MSarity detection dataset with only 12 epochs.
arXiv Detail & Related papers (2022-06-02T18:15:44Z) - AdaMixer: A Fast-Converging Query-Based Object Detector [32.159871347459166]
We propose a fast-converging query-based object detector named AdaMixer.
AdaMixer has architectural simplicity without requiring explicit pyramid networks.
Our work sheds light on a simple, accurate, and fast converging architecture for query-based object detectors.
arXiv Detail & Related papers (2022-03-30T17:45:02Z) - Progressive End-to-End Object Detection in Crowded Scenes [96.92416613336096]
Previous query-based detectors suffer from two drawbacks: first, multiple predictions will be inferred for a single object, typically in crowded scenes; second, the performance saturates as the depth of the decoding stage increases.
We propose a progressive predicting method to address the above issues. Specifically, we first select accepted queries to generate true positive predictions, then refine the rest noisy queries according to the previously accepted predictions.
Experiments show that our method can significantly boost the performance of query-based detectors in crowded scenes.
arXiv Detail & Related papers (2022-03-15T06:12:00Z) - Anchor DETR: Query Design for Transformer-Based Detector [24.925317590675203]
We propose a novel query design for the transformer-based detectors.
Object queries are based on anchor points, which are widely used in CNN-based detectors.
Our design can predict multiple objects at one position to solve the difficulty: "one region, multiple objects"
arXiv Detail & Related papers (2021-09-15T06:31:55Z) - End-to-End Object Detection with Fully Convolutional Network [71.56728221604158]
We introduce a Prediction-aware One-To-One (POTO) label assignment for classification to enable end-to-end detection.
A simple 3D Max Filtering (3DMF) is proposed to utilize the multi-scale features and improve the discriminability of convolutions in the local region.
Our end-to-end framework achieves competitive performance against many state-of-the-art detectors with NMS on COCO and CrowdHuman datasets.
arXiv Detail & Related papers (2020-12-07T09:14:55Z) - Sparse R-CNN: End-to-End Object Detection with Learnable Proposals [77.9701193170127]
We present Sparse R-CNN, a purely sparse method for object detection in images.
Final predictions are directly output without non-maximum suppression post-procedure.
We hope our work could inspire re-thinking the convention of dense prior in object detectors.
arXiv Detail & Related papers (2020-11-25T00:01:28Z) - FCOS: A simple and strong anchor-free object detector [111.87691210818194]
We propose a fully convolutional one-stage object detector (FCOS) to solve object detection in a per-pixel prediction fashion.
Almost all state-of-the-art object detectors such as RetinaNet, SSD, YOLOv3, and Faster R-CNN rely on pre-defined anchor boxes.
In contrast, our proposed detector FCOS is anchor box free, as well as proposal free.
arXiv Detail & Related papers (2020-06-14T01:03:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.