Related papers: D$^3$ETR: Decoder Distillation for Detection Transformer

D$^3$ETR: Decoder Distillation for Detection Transformer

URL: http://arxiv.org/abs/2211.09768v1
Date: Thu, 17 Nov 2022 18:47:24 GMT
Title: D$^3$ETR: Decoder Distillation for Detection Transformer
Authors: Xiaokang Chen, Jiahui Chen, Yan Liu, Gang Zeng
Abstract summary: We focus on the transformer decoder of DETR-based detectors and explore KD methods for them. The outputs of the transformer decoder lie in random order, which gives no direct correspondence between the predictions of the teacher and the student. We build textbfDecoder textbfDistillation for textbfDEtection textbfTRansformer (D$3$ETR) which distills knowledge in decoder predictions and attention maps from the teachers to students.
Score: 20.493873634246512
License: http://creativecommons.org/licenses/by/4.0/
Abstract: While various knowledge distillation (KD) methods in CNN-based detectors show their effectiveness in improving small students, the baselines and recipes for DETR-based detectors are yet to be built. In this paper, we focus on the transformer decoder of DETR-based detectors and explore KD methods for them. The outputs of the transformer decoder lie in random order, which gives no direct correspondence between the predictions of the teacher and the student, thus posing a challenge for knowledge distillation. To this end, we propose MixMatcher to align the decoder outputs of DETR-based teachers and students, which mixes two teacher-student matching strategies, i.e., Adaptive Matching and Fixed Matching. Specifically, Adaptive Matching applies bipartite matching to adaptively match the outputs of the teacher and the student in each decoder layer, while Fixed Matching fixes the correspondence between the outputs of the teacher and the student with the same object queries, with the teacher's fixed object queries fed to the decoder of the student as an auxiliary group. Based on MixMatcher, we build \textbf{D}ecoder \textbf{D}istillation for \textbf{DE}tection \textbf{TR}ansformer (D$^3$ETR), which distills knowledge in decoder predictions and attention maps from the teachers to students. D$^3$ETR shows superior performance on various DETR-based detectors with different backbones. For example, D$^3$ETR improves Conditional DETR-R50-C5 by $\textbf{7.8}/\textbf{2.4}$ mAP under $12/50$ epochs training settings with Conditional DETR-R101-C5 as the teacher.

Related papers

CLoCKDistill: Consistent Location-and-Context-aware Knowledge Distillation for DETRs [2.7624021966289605]
This paper proposes Consistent Location-and-Context-aware Knowledge Distillation (CLoCKDistill) for DETR detectors. We distill the transformer encoder output (i.e., memory) that contains valuable global context and long-range dependencies. Our method boosts student detector performance by 2.2% to 6.4%.
arXiv Detail & Related papers (2025-02-15T06:02:51Z)
How to Make Cross Encoder a Good Teacher for Efficient Image-Text Retrieval? [99.87554379608224]
Cross-modal similarity score distribution of cross-encoder is more concentrated while the result of dual-encoder is nearly normal. Only the relative order between hard negatives conveys valid knowledge while the order information between easy negatives has little significance. We propose a novel Contrastive Partial Ranking Distillation (DCPR) method which implements the objective of mimicking relative order between hard negative samples with contrastive learning.
arXiv Detail & Related papers (2024-07-10T09:10:01Z)
OD-DETR: Online Distillation for Stabilizing Training of Detection Transformer [14.714768026997534]
This paper aims to stabilize DETR training through the online distillation. It utilizes a teacher model, accumulated by Exponential Moving Average (EMA) Experiments show that the proposed OD-DETR successfully stabilizes the training, and significantly increases the performance without bringing in more parameters.
arXiv Detail & Related papers (2024-06-09T14:07:35Z)
Semi-DETR: Semi-Supervised Object Detection with Detection Transformers [105.45018934087076]
We analyze the DETR-based framework on semi-supervised object detection (SSOD) We present Semi-DETR, the first transformer-based end-to-end semi-supervised object detector. Our method outperforms all state-of-the-art methods by clear margins.
arXiv Detail & Related papers (2023-07-16T16:32:14Z)
Detection Transformer with Stable Matching [48.963171068785435]
We show that the most important design is to use and only use positional metrics to supervise classification scores of positive examples. Under the principle, we propose two simple yet effective modifications by integrating positional metrics to DETR's classification loss and matching cost. We achieve 50.4 and 51.5 AP on the COCO detection benchmark using ResNet-50 backbones under 12 epochs and 24 epochs training settings.
arXiv Detail & Related papers (2023-04-10T17:55:37Z)
Noise-Robust Dense Retrieval via Contrastive Alignment Post Training [89.29256833403167]
Contrastive Alignment POst Training (CAPOT) is a highly efficient finetuning method that improves model robustness without requiring index regeneration. CAPOT enables robust retrieval by freezing the document encoder while the query encoder learns to align noisy queries with their unaltered root. We evaluate CAPOT noisy variants of MSMARCO, Natural Questions, and Trivia QA passage retrieval, finding CAPOT has a similar impact as data augmentation with none of its overhead.
arXiv Detail & Related papers (2023-04-06T22:16:53Z)
Exploring Content Relationships for Distilling Efficient GANs [69.86835014810714]
This paper proposes a content relationship distillation (CRD) to tackle the over- parameterized generative adversarial networks (GANs) In contrast to traditional instance-level distillation, we design a novel GAN compression oriented knowledge by slicing the contents of teacher outputs into multiple fine-grained granularities. Built upon our proposed content-level distillation, we also deploy an online teacher discriminator, which keeps updating when co-trained with the teacher generator and keeps freezing when co-trained with the student generator for better adversarial training.
arXiv Detail & Related papers (2022-12-21T15:38:12Z)
DETRs with Collaborative Hybrid Assignments Training [11.563949886871713]
We present a novel collaborative hybrid assignments training scheme, namely $mathcalC$o-DETR. This training scheme can easily enhance the encoder's learning ability in end-to-end detectors. We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants.
arXiv Detail & Related papers (2022-11-22T16:19:52Z)
Pair DETR: Contrastive Learning Speeds Up DETR Training [0.6491645162078056]
We present a simple approach to address the main problem of DETR, the slow convergence. We detect an object bounding box as a pair of keypoints, the top-left corner and the center, using two decoders. Experiments show that Pair DETR can converge at least 10x faster than original DETR and 1.5x faster than Conditional DETR during training.
arXiv Detail & Related papers (2022-10-29T03:02:49Z)
G-DetKD: Towards General Distillation Framework for Object Detectors via Contrastive and Semantic-guided Feature Imitation [49.421099172544196]
We propose a novel semantic-guided feature imitation technique, which automatically performs soft matching between feature pairs across all pyramid levels. We also introduce contrastive distillation to effectively capture the information encoded in the relationship between different feature regions. Our method consistently outperforms the existing detection KD techniques, and works when (1) components in the framework are used separately and in conjunction.
arXiv Detail & Related papers (2021-08-17T07:44:27Z)
CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition [14.07385381963374]
We show that the transducer's encoder outputs naturally have a high entropy and contain rich information about acoustically similar word-piece confusions. We introduce an auxiliary loss to distill the encoder logits from a teacher transducer's encoder, and explore training strategies where this encoder distillation works effectively.
arXiv Detail & Related papers (2021-06-14T20:03:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.