Channel Self-Supervision for Online Knowledge Distillation
- URL: http://arxiv.org/abs/2203.11660v2
- Date: Wed, 23 Mar 2022 10:55:05 GMT
- Title: Channel Self-Supervision for Online Knowledge Distillation
- Authors: Shixiao Fan, Xuan Cheng, Xiaomin Wang, Chun Yang, Pan Deng, Minghui
Liu, Jiali Deng, Ming Liu
- Abstract summary: We propose a novel online knowledge distillation method, textbfChannel textbfSelf-textbfSupervision for Online Knowledge Distillation (CSS)
We construct a dual-network multi-branch structure and enhance inter-branch diversity through self-supervised learning.
Our method provides greater diversity than OKDDip and we also give pretty performance improvement, even over the state-of-the-art such as PCL.
- Score: 14.033675223173933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, researchers have shown an increased interest in the online
knowledge distillation. Adopting an one-stage and end-to-end training fashion,
online knowledge distillation uses aggregated intermediated predictions of
multiple peer models for training. However, the absence of a powerful teacher
model may result in the homogeneity problem between group peers, affecting the
effectiveness of group distillation adversely. In this paper, we propose a
novel online knowledge distillation method, \textbf{C}hannel
\textbf{S}elf-\textbf{S}upervision for Online Knowledge Distillation (CSS),
which structures diversity in terms of input, target, and network to alleviate
the homogenization problem. Specifically, we construct a dual-network
multi-branch structure and enhance inter-branch diversity through
self-supervised learning, adopting the feature-level transformation and
augmenting the corresponding labels. Meanwhile, the dual network structure has
a larger space of independent parameters to resist the homogenization problem
during distillation. Extensive quantitative experiments on CIFAR-100 illustrate
that our method provides greater diversity than OKDDip and we also give pretty
performance improvement, even over the state-of-the-art such as PCL. The
results on three fine-grained datasets (StanfordDogs, StanfordCars,
CUB-200-211) also show the significant generalization capability of our
approach.
Related papers
- AICSD: Adaptive Inter-Class Similarity Distillation for Semantic
Segmentation [12.92102548320001]
This paper proposes a novel method called Inter-Class Similarity Distillation (ICSD) for the purpose of knowledge distillation.
The proposed method transfers high-order relations from the teacher network to the student network by independently computing intra-class distributions for each class from network outputs.
Experiments conducted on two well-known datasets for semantic segmentation, Cityscapes and Pascal VOC 2012, validate the effectiveness of the proposed method.
arXiv Detail & Related papers (2023-08-08T13:17:20Z) - Distribution Shift Matters for Knowledge Distillation with Webly
Collected Images [91.66661969598755]
We propose a novel method dubbed Knowledge Distillation between Different Distributions" (KD$3$)
We first dynamically select useful training instances from the webly collected data according to the combined predictions of teacher network and student network.
We also build a new contrastive learning block called MixDistribution to generate perturbed data with a new distribution for instance alignment.
arXiv Detail & Related papers (2023-07-21T10:08:58Z) - Hierarchical Contrastive Learning Enhanced Heterogeneous Graph Neural
Network [59.860534520941485]
Heterogeneous graph neural networks (HGNNs) as an emerging technique have shown superior capacity of dealing with heterogeneous information network (HIN)
Recently, contrastive learning, a self-supervised method, becomes one of the most exciting learning paradigms and shows great potential when there are no labels.
In this paper, we study the problem of self-supervised HGNNs and propose a novel co-contrastive learning mechanism for HGNNs, named HeCo.
arXiv Detail & Related papers (2023-04-24T16:17:21Z) - Heterogeneous-Branch Collaborative Learning for Dialogue Generation [11.124375734351826]
Collaborative learning is an effective way to conduct one-stage group distillation in the absence of a well-trained large teacher model.
Previous work has a severe branch homogeneity problem due to the same training objective and independent identical training sets.
We propose a dual group-based knowledge distillation method, consisting of positive distillation and negative distillation, to further diversify the features of different branches in a steadily and interpretable way.
arXiv Detail & Related papers (2023-03-21T06:41:50Z) - Exploring Inter-Channel Correlation for Diversity-preserved
KnowledgeDistillation [91.56643684860062]
Inter-Channel Correlation for Knowledge Distillation(ICKD) is developed.
ICKD captures intrinsic distribution of the featurespace and sufficient diversity properties of features in the teacher network.
We are the first method based on knowl-edge distillation boosts ResNet18 beyond 72% Top-1 ac-curacy on ImageNet classification.
arXiv Detail & Related papers (2022-02-08T07:01:56Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - Online Knowledge Distillation via Multi-branch Diversity Enhancement [15.523646047674717]
We propose a new distillation method to enhance the diversity among multiple student models.
We use Feature Fusion Module (FFM), which improves the performance of the attention mechanism in the network.
We also use Diversification(CD) loss function to strengthen the differences between the student models.
arXiv Detail & Related papers (2020-10-02T05:52:12Z) - Differentiable Feature Aggregation Search for Knowledge Distillation [47.94874193183427]
We introduce the feature aggregation to imitate the multi-teacher distillation in the single-teacher distillation framework.
DFA is a two-stage Differentiable Feature Aggregation search method motivated by DARTS in neural architecture search.
Experimental results show that DFA outperforms existing methods on CIFAR-100 and CINIC-10 datasets.
arXiv Detail & Related papers (2020-08-02T15:42:29Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.