DETRs with Collaborative Hybrid Assignments Training
- URL: http://arxiv.org/abs/2211.12860v5
- Date: Wed, 9 Aug 2023 16:06:09 GMT
- Title: DETRs with Collaborative Hybrid Assignments Training
- Authors: Zhuofan Zong, Guanglu Song, Yu Liu
- Abstract summary: We present a novel collaborative hybrid assignments training scheme, namely $mathcalC$o-DETR.
This training scheme can easily enhance the encoder's learning ability in end-to-end detectors.
We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants.
- Score: 11.563949886871713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we provide the observation that too few queries assigned as
positive samples in DETR with one-to-one set matching leads to sparse
supervision on the encoder's output which considerably hurt the discriminative
feature learning of the encoder and vice visa for attention learning in the
decoder. To alleviate this, we present a novel collaborative hybrid assignments
training scheme, namely $\mathcal{C}$o-DETR, to learn more efficient and
effective DETR-based detectors from versatile label assignment manners. This
new training scheme can easily enhance the encoder's learning ability in
end-to-end detectors by training the multiple parallel auxiliary heads
supervised by one-to-many label assignments such as ATSS and Faster RCNN. In
addition, we conduct extra customized positive queries by extracting the
positive coordinates from these auxiliary heads to improve the training
efficiency of positive samples in the decoder. In inference, these auxiliary
heads are discarded and thus our method introduces no additional parameters and
computational cost to the original detector while requiring no hand-crafted
non-maximum suppression (NMS). We conduct extensive experiments to evaluate the
effectiveness of the proposed approach on DETR variants, including DAB-DETR,
Deformable-DETR, and DINO-Deformable-DETR. The state-of-the-art
DINO-Deformable-DETR with Swin-L can be improved from 58.5% to 59.5% AP on COCO
val. Surprisingly, incorporated with ViT-L backbone, we achieve 66.0% AP on
COCO test-dev and 67.9% AP on LVIS val, outperforming previous methods by clear
margins with much fewer model sizes. Codes are available at
\url{https://github.com/Sense-X/Co-DETR}.
Related papers
- Relative Difficulty Distillation for Semantic Segmentation [54.76143187709987]
We propose a pixel-level KD paradigm for semantic segmentation named Relative Difficulty Distillation (RDD)
RDD allows the teacher network to provide effective guidance on learning focus without additional optimization goals.
Our research showcases that RDD can integrate with existing KD methods to improve their upper performance bound.
arXiv Detail & Related papers (2024-07-04T08:08:25Z) - Cyclic-Bootstrap Labeling for Weakly Supervised Object Detection [134.05510658882278]
Cyclic-Bootstrap Labeling (CBL) is a novel weakly supervised object detection pipeline.
Uses a weighted exponential moving average strategy to take advantage of various refinement modules.
A novel class-specific ranking distillation algorithm is proposed to leverage the output of weighted ensembled teacher network.
arXiv Detail & Related papers (2023-08-11T07:57:17Z) - Query Encoder Distillation via Embedding Alignment is a Strong Baseline
Method to Boost Dense Retriever Online Efficiency [4.254906060165999]
We show that even a 2-layer, BERT-based query encoder can still retain 92.5% of the full DE performance on the BEIR benchmark.
We hope that our findings will encourage the community to re-evaluate the trade-offs between method complexity and performance improvements.
arXiv Detail & Related papers (2023-06-05T06:53:55Z) - Align-DETR: Improving DETR with Simple IoU-aware BCE loss [32.13866392998818]
We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem.
The proposed loss, IA-BCE, guides the training of DETR to build a strong correlation between classification score and localization precision.
To overcome the dramatic decrease in sample quality induced by the sparsity of queries, we introduce a prime sample weighting mechanism.
arXiv Detail & Related papers (2023-04-15T10:24:51Z) - Detection Transformer with Stable Matching [48.963171068785435]
We show that the most important design is to use and only use positional metrics to supervise classification scores of positive examples.
Under the principle, we propose two simple yet effective modifications by integrating positional metrics to DETR's classification loss and matching cost.
We achieve 50.4 and 51.5 AP on the COCO detection benchmark using ResNet-50 backbones under 12 epochs and 24 epochs training settings.
arXiv Detail & Related papers (2023-04-10T17:55:37Z) - Exploring Content Relationships for Distilling Efficient GANs [69.86835014810714]
This paper proposes a content relationship distillation (CRD) to tackle the over- parameterized generative adversarial networks (GANs)
In contrast to traditional instance-level distillation, we design a novel GAN compression oriented knowledge by slicing the contents of teacher outputs into multiple fine-grained granularities.
Built upon our proposed content-level distillation, we also deploy an online teacher discriminator, which keeps updating when co-trained with the teacher generator and keeps freezing when co-trained with the student generator for better adversarial training.
arXiv Detail & Related papers (2022-12-21T15:38:12Z) - Teach-DETR: Better Training DETR with Teachers [43.37671158294093]
Teach-DETR is a training scheme to learn better DETR-based detectors from versatile teacher detectors.
We improve the state-of-the-art detector DINO with Swin-Large backbone, 4 scales of feature maps and 36-epoch training schedule.
arXiv Detail & Related papers (2022-11-22T02:16:53Z) - CCLF: A Contrastive-Curiosity-Driven Learning Framework for
Sample-Efficient Reinforcement Learning [56.20123080771364]
We develop a model-agnostic Contrastive-Curiosity-Driven Learning Framework (CCLF) for reinforcement learning.
CCLF fully exploit sample importance and improve learning efficiency in a self-supervised manner.
We evaluate this approach on the DeepMind Control Suite, Atari, and MiniGrid benchmarks.
arXiv Detail & Related papers (2022-05-02T14:42:05Z) - Recurrent Glimpse-based Decoder for Detection with Transformer [85.64521612986456]
We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
arXiv Detail & Related papers (2021-12-09T00:29:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.