Task-Balanced Distillation for Object Detection
- URL: http://arxiv.org/abs/2208.03006v1
- Date: Fri, 5 Aug 2022 06:43:40 GMT
- Title: Task-Balanced Distillation for Object Detection
- Authors: Ruining Tang, Zhenyu Liu, Yangguang Li, Yiguo Song, Hui Liu, Qide
Wang, Jing Shao, Guifang Duan, Jianrong Tan
- Abstract summary: RetinaNet with ResNet-50 achieves 41.0 mAP under the benchmark, outperforming the recent FGD and COCO.
A novel Task-decoupled Feature Distillation (TFD) is proposed by flexibly balancing the contributions of classification and regression tasks.
- Score: 18.939830805129787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mainstream object detectors are commonly constituted of two sub-tasks,
including classification and regression tasks, implemented by two parallel
heads. This classic design paradigm inevitably leads to inconsistent spatial
distributions between classification score and localization quality (IOU).
Therefore, this paper alleviates this misalignment in the view of knowledge
distillation. First, we observe that the massive teacher achieves a higher
proportion of harmonious predictions than the lightweight student. Based on
this intriguing observation, a novel Harmony Score (HS) is devised to estimate
the alignment of classification and regression qualities. HS models the
relationship between two sub-tasks and is seen as prior knowledge to promote
harmonious predictions for the student. Second, this spatial misalignment will
result in inharmonious region selection when distilling features. To alleviate
this problem, a novel Task-decoupled Feature Distillation (TFD) is proposed by
flexibly balancing the contributions of classification and regression tasks.
Eventually, HD and TFD constitute the proposed method, named Task-Balanced
Distillation (TBD). Extensive experiments demonstrate the considerable
potential and generalization of the proposed method. Specifically, when
equipped with TBD, RetinaNet with ResNet-50 achieves 41.0 mAP under the COCO
benchmark, outperforming the recent FGD and FRS.
Related papers
- Multi-Granularity Semantic Revision for Large Language Model Distillation [66.03746866578274]
We propose a multi-granularity semantic revision method for LLM distillation.
At the sequence level, we propose a sequence correction and re-generation strategy.
At the token level, we design a distribution adaptive clipping Kullback-Leibler loss as the distillation objective function.
At the span level, we leverage the span priors of a sequence to compute the probability correlations within spans, and constrain the teacher and student's probability correlations to be consistent.
arXiv Detail & Related papers (2024-07-14T03:51:49Z) - Advancing Pre-trained Teacher: Towards Robust Feature Discrepancy for Anomaly Detection [19.099643719358692]
We propose a simple yet effective two-stage industrial anomaly detection framework, termed as AAND.
In the first anomaly amplification stage, we propose a novel Residual Anomaly Amplification (RAA) module to advance the pre-trained teacher encoder.
We further employ a reverse distillation paradigm to train a student decoder, in which a novel Hard Knowledge Distillation (HKD) loss is built to better facilitate the reconstruction of normal patterns.
arXiv Detail & Related papers (2024-05-03T13:00:22Z) - Bridging Cross-task Protocol Inconsistency for Distillation in Dense
Object Detection [19.07452370081663]
We propose a novel distillation method with cross-task consistent protocols, tailored for dense object detection.
For classification distillation, we formulate the classification logit maps in both teacher and student models as multiple binary-classification maps and applying a binary-classification distillation loss to each map.
Our proposed method is simple but effective, and experimental results demonstrate its superiority over existing methods.
arXiv Detail & Related papers (2023-08-28T03:57:37Z) - Improving Knowledge Distillation via Regularizing Feature Norm and
Direction [16.98806338782858]
Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task.
Treating teacher features as knowledge, prevailing methods of knowledge distillation train student by aligning its features with the teacher's, e.g., by minimizing the KL-divergence between their logits or L2 distance between their intermediate features.
While it is natural to believe that better alignment of student features to the teacher better distills teacher knowledge, simply forcing this alignment does not directly contribute to the student's performance, e.g.
arXiv Detail & Related papers (2023-05-26T15:05:19Z) - Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD)
We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature.
We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z) - CORSD: Class-Oriented Relational Self Distillation [16.11986532440837]
Knowledge distillation conducts an effective model compression method while holding some limitations.
We propose a novel training framework named Class-Oriented Self Distillation (CORSD) to address the limitations.
arXiv Detail & Related papers (2023-04-28T16:00:31Z) - Mind the Gap in Distilling StyleGANs [100.58444291751015]
StyleGAN family is one of the most popular Generative Adversarial Networks (GANs) for unconditional generation.
This paper provides a comprehensive study of distilling from the popular StyleGAN-like architecture.
arXiv Detail & Related papers (2022-08-18T14:18:29Z) - SEA: Bridging the Gap Between One- and Two-stage Detector Distillation
via SEmantic-aware Alignment [76.80165589520385]
We name our method SEA (SEmantic-aware Alignment) distillation given the nature of abstracting dense fine-grained information.
It achieves new state-of-the-art results on the challenging object detection task on both one- and two-stage detectors.
arXiv Detail & Related papers (2022-03-02T04:24:05Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - Class-incremental Learning with Rectified Feature-Graph Preservation [24.098892115785066]
A central theme of this paper is to learn new classes that arrive in sequential phases over time.
We propose a weighted-Euclidean regularization for old knowledge preservation.
We show how it can work with binary cross-entropy to increase class separation for effective learning of new classes.
arXiv Detail & Related papers (2020-12-15T07:26:04Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.