Feature-based One-For-All: A Universal Framework for Heterogeneous Knowledge Distillation
- URL: http://arxiv.org/abs/2501.08885v1
- Date: Wed, 15 Jan 2025 15:56:06 GMT
- Title: Feature-based One-For-All: A Universal Framework for Heterogeneous Knowledge Distillation
- Authors: Jhe-Hao Lin, Yi Yao, Chan-Feng Hsu, Hongxia Xie, Hong-Han Shuai, Wen-Huang Cheng,
- Abstract summary: Knowledge distillation (KD) involves transferring knowledge from a pre-trained heavy teacher model to a lighter student model.
We introduce a feature-based one-for-all (FOFA) KD framework to enable feature distillation across diverse architecture.
Our framework comprises two key components. First, we design prompt tuning blocks that incorporate student feedback, allowing teacher features to adapt to the student model's learning process.
- Score: 28.722795943076306
- License:
- Abstract: Knowledge distillation (KD) involves transferring knowledge from a pre-trained heavy teacher model to a lighter student model, thereby reducing the inference cost while maintaining comparable effectiveness. Prior KD techniques typically assume homogeneity between the teacher and student models. However, as technology advances, a wide variety of architectures have emerged, ranging from initial Convolutional Neural Networks (CNNs) to Vision Transformers (ViTs), and Multi-Level Perceptrons (MLPs). Consequently, developing a universal KD framework compatible with any architecture has become an important research topic. In this paper, we introduce a feature-based one-for-all (FOFA) KD framework to enable feature distillation across diverse architecture. Our framework comprises two key components. First, we design prompt tuning blocks that incorporate student feedback, allowing teacher features to adapt to the student model's learning process. Second, we propose region-aware attention to mitigate the view mismatch problem between heterogeneous architecture. By leveraging these two modules, effective distillation of intermediate features can be achieved across heterogeneous architectures. Extensive experiments on CIFAR, ImageNet, and COCO demonstrate the superiority of the proposed method.
Related papers
- TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant [52.0297393822012]
We introduce an assistant model as a bridge to facilitate smooth feature knowledge transfer between heterogeneous teachers and students.
Within our proposed design principle, the assistant model combines the advantages of cross-architecture inductive biases and module functions.
Our proposed method is evaluated across some homogeneous model pairs and arbitrary heterogeneous combinations of CNNs, ViTs, spatial KDs.
arXiv Detail & Related papers (2024-10-16T08:02:49Z) - Generalizing Teacher Networks for Effective Knowledge Distillation Across Student Architectures [4.960025399247103]
Generic Teacher Network (GTN) is a one-off KD-aware training to create a generic teacher capable of effectively transferring knowledge to any student model sampled from a finite pool of architectures.
Our method both improves overall KD effectiveness and amortizes the minimal additional training cost of the generic teacher across students in the pool.
arXiv Detail & Related papers (2024-07-22T20:34:00Z) - Aligning in a Compact Space: Contrastive Knowledge Distillation between Heterogeneous Architectures [4.119589507611071]
We propose a Low-Frequency Components-based Contrastive Knowledge Distillation (LFCC) framework that significantly enhances the performance of feature-based distillation.
Specifically, we designe a set of multi-scale low-pass filters to extract the low-frequency components of intermediate features from both the teacher and student models.
We show that LFCC achieves superior performance on the challenging benchmarks of ImageNet-1K and CIFAR-100.
arXiv Detail & Related papers (2024-05-28T18:44:42Z) - One-for-All: Bridge the Gap Between Heterogeneous Architectures in
Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme.
Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family.
We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z) - Enhancing Representations through Heterogeneous Self-Supervised Learning [61.40674648939691]
We propose Heterogeneous Self-Supervised Learning (HSSL), which enforces a base model to learn from an auxiliary head whose architecture is heterogeneous from the base model.
The HSSL endows the base model with new characteristics in a representation learning way without structural changes.
The HSSL is compatible with various self-supervised methods, achieving superior performances on various downstream tasks.
arXiv Detail & Related papers (2023-10-08T10:44:05Z) - Exploring Inter-Channel Correlation for Diversity-preserved
KnowledgeDistillation [91.56643684860062]
Inter-Channel Correlation for Knowledge Distillation(ICKD) is developed.
ICKD captures intrinsic distribution of the featurespace and sufficient diversity properties of features in the teacher network.
We are the first method based on knowl-edge distillation boosts ResNet18 beyond 72% Top-1 ac-curacy on ImageNet classification.
arXiv Detail & Related papers (2022-02-08T07:01:56Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - Revisiting Knowledge Distillation: An Inheritance and Exploration
Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model.
We propose a novel inheritance and exploration knowledge distillation framework (IE-KD)
Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.