Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
- URL: http://arxiv.org/abs/2207.13085v3
- Date: Thu, 31 Aug 2023 04:00:18 GMT
- Title: Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
- Authors: Qiang Chen, Xiaokang Chen, Jian Wang, Shan Zhang, Kun Yao, Haocheng
Feng, Junyu Han, Errui Ding, Gang Zeng, Jingdong Wang
- Abstract summary: One-to-many assignment, assigning one ground-truth object to multiple predictions, succeeds in detection methods such as Faster R-CNN and FCOS.
We introduce Group DETR, a simple yet efficient DETR training approach that introduces a group-wise way for one-to-many assignment.
Experiments show that Group DETR significantly speeds up the training convergence and improves the performance of various DETR-based models.
- Score: 80.55064790937092
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Detection transformer (DETR) relies on one-to-one assignment, assigning one
ground-truth object to one prediction, for end-to-end detection without NMS
post-processing. It is known that one-to-many assignment, assigning one
ground-truth object to multiple predictions, succeeds in detection methods such
as Faster R-CNN and FCOS. While the naive one-to-many assignment does not work
for DETR, and it remains challenging to apply one-to-many assignment for DETR
training. In this paper, we introduce Group DETR, a simple yet efficient DETR
training approach that introduces a group-wise way for one-to-many assignment.
This approach involves using multiple groups of object queries, conducting
one-to-one assignment within each group, and performing decoder self-attention
separately. It resembles data augmentation with automatically-learned object
query augmentation. It is also equivalent to simultaneously training
parameter-sharing networks of the same architecture, introducing more
supervision and thus improving DETR training. The inference process is the same
as DETR trained normally and only needs one group of queries without any
architecture modification. Group DETR is versatile and is applicable to various
DETR variants. The experiments show that Group DETR significantly speeds up the
training convergence and improves the performance of various DETR-based models.
Code will be available at \url{https://github.com/Atten4Vis/GroupDETR}.
Related papers
- MS-DETR: Efficient DETR Training with Mixed Supervision [74.93329653526952]
MS-DETR places one-to-many supervision to the object queries of the primary decoder that is used for inference.
Our approach does not need additional decoder branches or object queries.
Experimental results show that our approach outperforms related DETR variants.
arXiv Detail & Related papers (2024-01-08T16:08:53Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Enhancing Few-shot NER with Prompt Ordering based Data Augmentation [59.69108119752584]
We propose a Prompt Ordering based Data Augmentation (PODA) method to improve the training of unified autoregressive generation frameworks.
Experimental results on three public NER datasets and further analyses demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-05-19T16:25:43Z) - FeatAug-DETR: Enriching One-to-Many Matching for DETRs with Feature
Augmentation [48.94488166162821]
One-to-one matching is a crucial design in DETR-like object detection frameworks.
We propose two methods that realize one-to-many matching from a different perspective of augmenting images or image features.
We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants.
arXiv Detail & Related papers (2023-03-02T18:59:48Z) - Team DETR: Guide Queries as a Professional Team in Detection
Transformers [31.521916994653235]
We propose Team DETR, which leverages query collaboration and position constraints to embrace objects of interest more precisely.
We also dynamically cater to each query member's prediction preference, offering the query better scale and spatial priors.
In addition, the proposed Team DETR is flexible enough to be adapted to other existing DETR variants without increasing parameters and calculations.
arXiv Detail & Related papers (2023-02-14T15:21:53Z) - Pair DETR: Contrastive Learning Speeds Up DETR Training [0.6491645162078056]
We present a simple approach to address the main problem of DETR, the slow convergence.
We detect an object bounding box as a pair of keypoints, the top-left corner and the center, using two decoders.
Experiments show that Pair DETR can converge at least 10x faster than original DETR and 1.5x faster than Conditional DETR during training.
arXiv Detail & Related papers (2022-10-29T03:02:49Z) - DETRs with Hybrid Matching [21.63116788914251]
One-to-one set matching is a key design for DETR to establish its end-to-end capability.
We propose a hybrid matching scheme that combines the original one-to-one matching branch with an auxiliary one-to-many matching branch during training.
arXiv Detail & Related papers (2022-07-26T17:52:14Z) - UP-DETR: Unsupervised Pre-training for Object Detection with
Transformers [11.251593386108189]
We propose a novel pretext task named random query patch detection in Unsupervised Pre-training DETR (UP-DETR)
Specifically, we randomly crop patches from the given image and then feed them as queries to the decoder.
UP-DETR significantly boosts the performance of DETR with faster convergence and higher average precision on object detection, one-shot detection and panoptic segmentation.
arXiv Detail & Related papers (2020-11-18T05:16:11Z) - End-to-End Object Detection with Transformers [88.06357745922716]
We present a new method that views object detection as a direct set prediction problem.
Our approach streamlines the detection pipeline, effectively removing the need for many hand-designed components.
The main ingredients of the new framework, called DEtection TRansformer or DETR, are a set-based global loss.
arXiv Detail & Related papers (2020-05-26T17:06:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.