Q-DETR: An Efficient Low-Bit Quantized Detection Transformer
- URL: http://arxiv.org/abs/2304.00253v1
- Date: Sat, 1 Apr 2023 08:05:14 GMT
- Title: Q-DETR: An Efficient Low-Bit Quantized Detection Transformer
- Authors: Sheng Xu, Yanjing Li, Mingbao Lin, Peng Gao, Guodong Guo, Jinhu Lu,
Baochang Zhang
- Abstract summary: We find that the bottlenecks of Q-DETR come from the query information distortion through our empirical analyses.
We formulate our DRD as a bi-level optimization problem, which can be derived by generalizing the information bottleneck (IB) principle to the learning of Q-DETR.
We introduce a new foreground-aware query matching scheme to effectively transfer the teacher information to distillation-desired features to minimize the conditional information entropy.
- Score: 50.00784028552792
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The recent detection transformer (DETR) has advanced object detection, but
its application on resource-constrained devices requires massive computation
and memory resources. Quantization stands out as a solution by representing the
network in low-bit parameters and operations. However, there is a significant
performance drop when performing low-bit quantized DETR (Q-DETR) with existing
quantization methods. We find that the bottlenecks of Q-DETR come from the
query information distortion through our empirical analyses. This paper
addresses this problem based on a distribution rectification distillation
(DRD). We formulate our DRD as a bi-level optimization problem, which can be
derived by generalizing the information bottleneck (IB) principle to the
learning of Q-DETR. At the inner level, we conduct a distribution alignment for
the queries to maximize the self-information entropy. At the upper level, we
introduce a new foreground-aware query matching scheme to effectively transfer
the teacher information to distillation-desired features to minimize the
conditional information entropy. Extensive experimental results show that our
method performs much better than prior arts. For example, the 4-bit Q-DETR can
theoretically accelerate DETR with ResNet-50 backbone by 6.6x and achieve 39.4%
AP, with only 2.6% performance gaps than its real-valued counterpart on the
COCO dataset.
Related papers
- ADFQ-ViT: Activation-Distribution-Friendly Post-Training Quantization for Vision Transformers [7.155242379236052]
Quantization of Vision Transformers (ViTs) has emerged as a promising solution to mitigate these challenges.
Existing methods still suffer from significant accuracy loss at low-bit.
ADFQ-ViT provides significant improvements over various baselines in image classification, object detection, and instance segmentation tasks at 4-bit.
arXiv Detail & Related papers (2024-07-03T02:41:59Z) - Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement [19.277560848076984]
Two-stage selection strategies result in scale bias and redundancy due to mismatch between selected queries and objects.
We propose hierarchical salience filtering refinement, which performs transformer encoding only on filtered discriminative queries.
The proposed Salience DETR achieves significant improvements of +4.0% AP, +0.2% AP, +4.4% AP on three challenging task-specific detection datasets.
arXiv Detail & Related papers (2024-03-24T13:01:57Z) - Align-DETR: Improving DETR with Simple IoU-aware BCE loss [32.13866392998818]
We propose a metric, recall of best-regressed samples, to quantitively evaluate the misalignment problem.
The proposed loss, IA-BCE, guides the training of DETR to build a strong correlation between classification score and localization precision.
To overcome the dramatic decrease in sample quality induced by the sparsity of queries, we introduce a prime sample weighting mechanism.
arXiv Detail & Related papers (2023-04-15T10:24:51Z) - RepQ-ViT: Scale Reparameterization for Post-Training Quantization of
Vision Transformers [2.114921680609289]
We propose RepQ-ViT, a novel PTQ framework for vision transformers (ViTs)
RepQ-ViT decouples the quantization and inference processes.
It can outperform existing strong baselines and encouragingly improve the accuracy of 4-bit PTQ of ViTs to a usable level.
arXiv Detail & Related papers (2022-12-16T02:52:37Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Efficient Decoder-free Object Detection with Transformers [75.00499377197475]
Vision transformers (ViTs) are changing the landscape of object detection approaches.
We propose a decoder-free fully transformer-based (DFFT) object detector.
DFFT_SMALL achieves high efficiency in both training and inference stages.
arXiv Detail & Related papers (2022-06-14T13:22:19Z) - Recurrent Glimpse-based Decoder for Detection with Transformer [85.64521612986456]
We introduce a novel REcurrent Glimpse-based decOder (REGO) in this paper.
In particular, the REGO employs a multi-stage recurrent processing structure to help the attention of DETR gradually focus on foreground objects.
REGO consistently boosts the performance of different DETR detectors by up to 7% relative gain at the same setting of 50 training epochs.
arXiv Detail & Related papers (2021-12-09T00:29:19Z) - Deep Structured Instance Graph for Distilling Object Detectors [82.16270736573176]
We present a simple knowledge structure to exploit and encode information inside the detection system to facilitate detector knowledge distillation.
We achieve new state-of-the-art results on the challenging COCO object detection task with diverse student-teacher pairs on both one- and two-stage detectors.
arXiv Detail & Related papers (2021-09-27T08:26:00Z) - Rethinking Transformer-based Set Prediction for Object Detection [57.7208561353529]
Experimental results show that the proposed methods not only converge much faster than the original DETR, but also significantly outperform DETR and other baselines in terms of detection accuracy.
arXiv Detail & Related papers (2020-11-21T21:59:42Z) - VecQ: Minimal Loss DNN Model Compression With Vectorized Weight
Quantization [19.66522714831141]
We develop a new quantization solution called VecQ, which can guarantee minimal direct quantization loss and better model accuracy.
In addition, in order to up the proposed quantization process during training, we accelerate the quantization process with a parameterized estimation and probability-based calculation.
arXiv Detail & Related papers (2020-05-18T07:38:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.