Exploring Inter-Channel Correlation for Diversity-preserved
KnowledgeDistillation
- URL: http://arxiv.org/abs/2202.03680v1
- Date: Tue, 8 Feb 2022 07:01:56 GMT
- Title: Exploring Inter-Channel Correlation for Diversity-preserved
KnowledgeDistillation
- Authors: Li Liu, Qingle Huang, Sihao Lin, Hongwei Xie, Bing Wang, Xiaojun
Chang, Xiaodan Liang
- Abstract summary: Inter-Channel Correlation for Knowledge Distillation(ICKD) is developed.
ICKD captures intrinsic distribution of the featurespace and sufficient diversity properties of features in the teacher network.
We are the first method based on knowl-edge distillation boosts ResNet18 beyond 72% Top-1 ac-curacy on ImageNet classification.
- Score: 91.56643684860062
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Knowledge Distillation has shown very promising abil-ity in transferring
learned representation from the largermodel (teacher) to the smaller one
(student).Despitemany efforts, prior methods ignore the important role
ofretaining inter-channel correlation of features, leading tothe lack of
capturing intrinsic distribution of the featurespace and sufficient diversity
properties of features in theteacher network.To solve the issue, we propose
thenovel Inter-Channel Correlation for Knowledge Distillation(ICKD), with which
the diversity and homology of the fea-ture space of the student network can
align with that ofthe teacher network. The correlation between these
twochannels is interpreted as diversity if they are irrelevantto each other,
otherwise homology. Then the student isrequired to mimic the correlation within
its own embed-ding space. In addition, we introduce the grid-level
inter-channel correlation, making it capable of dense predictiontasks.
Extensive experiments on two vision tasks, includ-ing ImageNet classification
and Pascal VOC segmentation,demonstrate the superiority of our ICKD, which
consis-tently outperforms many existing methods, advancing thestate-of-the-art
in the fields of Knowledge Distillation. Toour knowledge, we are the first
method based on knowl-edge distillation boosts ResNet18 beyond 72% Top-1
ac-curacy on ImageNet classification. Code is available
at:https://github.com/ADLab-AutoDrive/ICKD.
Related papers
- I2CKD : Intra- and Inter-Class Knowledge Distillation for Semantic Segmentation [1.433758865948252]
This paper proposes a new knowledge distillation method tailored for image semantic segmentation, termed Intra- and Inter-Class Knowledge Distillation (I2CKD)
The focus of this method is on capturing and transferring knowledge between the intermediate layers of teacher (cumbersome model) and student (compact model)
arXiv Detail & Related papers (2024-03-27T12:05:22Z) - Direct Distillation between Different Domains [97.39470334253163]
We propose a new one-stage method dubbed Direct Distillation between Different Domains" (4Ds)
We first design a learnable adapter based on the Fourier transform to separate the domain-invariant knowledge from the domain-specific knowledge.
We then build a fusion-activation mechanism to transfer the valuable domain-invariant knowledge to the student network.
arXiv Detail & Related papers (2024-01-12T02:48:51Z) - Distilling Efficient Vision Transformers from CNNs for Semantic
Segmentation [12.177329445930276]
We propose a novel CNN-to-ViT KD framework, dubbed C2VKD.
We first propose a novel visual-linguistic feature distillation (VLFD) module that explores efficient KD among the aligned visual and linguistic-compatible representations.
We then propose a pixel-wise decoupled distillation (PDD) module to supervise the student under the combination of labels and teacher's predictions from the decoupled target and non-target classes.
arXiv Detail & Related papers (2023-10-11T07:45:37Z) - Knowledge Distillation Meets Open-Set Semi-Supervised Learning [69.21139647218456]
We propose a novel em modelname (bfem shortname) method dedicated for distilling representational knowledge semantically from a pretrained teacher to a target student.
At the problem level, this establishes an interesting connection between knowledge distillation with open-set semi-supervised learning (SSL)
Our shortname outperforms significantly previous state-of-the-art knowledge distillation methods on both coarse object classification and fine face recognition tasks.
arXiv Detail & Related papers (2022-05-13T15:15:27Z) - Channel Self-Supervision for Online Knowledge Distillation [14.033675223173933]
We propose a novel online knowledge distillation method, textbfChannel textbfSelf-textbfSupervision for Online Knowledge Distillation (CSS)
We construct a dual-network multi-branch structure and enhance inter-branch diversity through self-supervised learning.
Our method provides greater diversity than OKDDip and we also give pretty performance improvement, even over the state-of-the-art such as PCL.
arXiv Detail & Related papers (2022-03-22T12:35:20Z) - Channel Exchanging Networks for Multimodal and Multitask Dense Image
Prediction [125.18248926508045]
We propose Channel-Exchanging-Network (CEN) which is self-adaptive, parameter-free, and more importantly, applicable for both multimodal fusion and multitask learning.
CEN dynamically exchanges channels betweenworks of different modalities.
For the application of dense image prediction, the validity of CEN is tested by four different scenarios.
arXiv Detail & Related papers (2021-12-04T05:47:54Z) - Wasserstein Contrastive Representation Distillation [114.24609306495456]
We propose Wasserstein Contrastive Representation Distillation (WCoRD), which leverages both primal and dual forms of Wasserstein distance for knowledge distillation.
The dual form is used for global knowledge transfer, yielding a contrastive learning objective that maximizes the lower bound of mutual information between the teacher and the student networks.
Experiments demonstrate that the proposed WCoRD method outperforms state-of-the-art approaches on privileged information distillation, model compression and cross-modal transfer.
arXiv Detail & Related papers (2020-12-15T23:43:28Z) - Channel-wise Knowledge Distillation for Dense Prediction [73.99057249472735]
We propose to align features channel-wise between the student and teacher networks.
We consistently achieve superior performance on three benchmarks with various network structures.
arXiv Detail & Related papers (2020-11-26T12:00:38Z) - Differentiable Feature Aggregation Search for Knowledge Distillation [47.94874193183427]
We introduce the feature aggregation to imitate the multi-teacher distillation in the single-teacher distillation framework.
DFA is a two-stage Differentiable Feature Aggregation search method motivated by DARTS in neural architecture search.
Experimental results show that DFA outperforms existing methods on CIFAR-100 and CINIC-10 datasets.
arXiv Detail & Related papers (2020-08-02T15:42:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.