Online Knowledge Distillation via Mutual Contrastive Learning for Visual
Recognition
- URL: http://arxiv.org/abs/2207.11518v2
- Date: Mon, 27 Mar 2023 14:12:20 GMT
- Title: Online Knowledge Distillation via Mutual Contrastive Learning for Visual
Recognition
- Authors: Chuanguang Yang, Zhulin An, Helong Zhou, Fuzhen Zhuang, Yongjun Xu,
Qian Zhan
- Abstract summary: We present a Mutual Contrastive Learning (MCL) framework for online Knowledge Distillation (KD)
Our MCL can aggregate cross-network embedding information and maximize the lower bound to the mutual information between two networks.
Experiments on image classification and transfer learning to visual recognition tasks show that layer-wise MCL can lead to consistent performance gains.
- Score: 27.326420185846327
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The teacher-free online Knowledge Distillation (KD) aims to train an ensemble
of multiple student models collaboratively and distill knowledge from each
other. Although existing online KD methods achieve desirable performance, they
often focus on class probabilities as the core knowledge type, ignoring the
valuable feature representational information. We present a Mutual Contrastive
Learning (MCL) framework for online KD. The core idea of MCL is to perform
mutual interaction and transfer of contrastive distributions among a cohort of
networks in an online manner. Our MCL can aggregate cross-network embedding
information and maximize the lower bound to the mutual information between two
networks. This enables each network to learn extra contrastive knowledge from
others, leading to better feature representations, thus improving the
performance of visual recognition tasks. Beyond the final layer, we extend MCL
to intermediate layers and perform an adaptive layer-matching mechanism trained
by meta-optimization. Experiments on image classification and transfer learning
to visual recognition tasks show that layer-wise MCL can lead to consistent
performance gains against state-of-the-art online KD approaches. The
superiority demonstrates that layer-wise MCL can guide the network to generate
better feature representations. Our code is publicly avaliable at
https://github.com/winycg/L-MCL.
Related papers
- Interactive Continual Learning: Fast and Slow Thinking [19.253164551254734]
This paper presents a novel Interactive Continual Learning framework, enabled by collaborative interactions among models of various sizes.
To improve memory retrieval in System1, we introduce the CL-vMF mechanism, based on the von Mises-Fisher (vMF) distribution.
Comprehensive evaluation of our proposed ICL demonstrates significant resistance to forgetting and superior performance relative to existing methods.
arXiv Detail & Related papers (2024-03-05T03:37:28Z) - Non-Contrastive Learning Meets Language-Image Pre-Training [145.6671909437841]
We study the validity of non-contrastive language-image pre-training (nCLIP)
We introduce xCLIP, a multi-tasking framework combining CLIP and nCLIP, and show that nCLIP aids CLIP in enhancing feature semantics.
arXiv Detail & Related papers (2022-10-17T17:57:46Z) - Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning [92.18524491615548]
Contrastive self-supervised learning has been successfully integrated into the practice of (deep) reinforcement learning (RL)
We study how RL can be empowered by contrastive learning in a class of Markov decision processes (MDPs) and Markov games (MGs) with low-rank transitions.
Under the online setting, we propose novel upper confidence bound (UCB)-type algorithms that incorporate such a contrastive loss with online RL algorithms for MDPs or MGs.
arXiv Detail & Related papers (2022-07-29T17:29:08Z) - Online Continual Learning with Contrastive Vision Transformer [67.72251876181497]
This paper proposes a framework Contrastive Vision Transformer (CVT) to achieve a better stability-plasticity trade-off for online CL.
Specifically, we design a new external attention mechanism for online CL that implicitly captures previous tasks' information.
Based on the learnable focuses, we design a focal contrastive loss to rebalance contrastive learning between new and past classes and consolidate previously learned representations.
arXiv Detail & Related papers (2022-07-24T08:51:02Z) - Deep Image Clustering with Contrastive Learning and Multi-scale Graph
Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN)
Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z) - Mutual Contrastive Learning for Visual Representation Learning [1.9355744690301404]
We present a collaborative learning method called Mutual Contrastive Learning (MCL) for general visual representation learning.
Benefiting from MCL, each model can learn extra contrastive knowledge from others, leading to more meaningful feature representations.
Experimental results on supervised and self-supervised image classification, transfer learning and few-shot learning show that MCL can lead to consistent performance gains.
arXiv Detail & Related papers (2021-04-26T13:32:33Z) - Knowledge Distillation By Sparse Representation Matching [107.87219371697063]
We propose Sparse Representation Matching (SRM) to transfer intermediate knowledge from one Convolutional Network (CNN) to another by utilizing sparse representation.
We formulate as a neural processing block, which can be efficiently optimized using gradient descent and integrated into any CNN in a plug-and-play manner.
Our experiments demonstrate that is robust to architectural differences between the teacher and student networks, and outperforms other KD techniques across several datasets.
arXiv Detail & Related papers (2021-03-31T11:47:47Z) - Multi-level Knowledge Distillation [13.71183256776644]
We introduce Multi-level Knowledge Distillation (MLKD) to transfer richer representational knowledge from teacher to student networks.
MLKD employs three novel teacher-student similarities: individual similarity, relational similarity, and categorical similarity.
Experiments demonstrate that MLKD outperforms other state-of-the-art methods on both similar-architecture and cross-architecture tasks.
arXiv Detail & Related papers (2020-12-01T15:27:15Z) - Knowledge Transfer via Dense Cross-Layer Mutual-Distillation [24.24969126783315]
We propose Dense Cross-layer Mutual-distillation (DCM) in which the teacher and student networks are trained collaboratively from scratch.
To boost KT performance, we introduce dense bidirectional KD operations between the layers with appended classifiers.
We test our method on a variety of KT tasks, showing its superiorities over related methods.
arXiv Detail & Related papers (2020-08-18T09:25:08Z) - Multi-view Contrastive Learning for Online Knowledge Distillation [12.250230630124758]
Previous Online Knowledge Distillation (OKD) often carries out mutually exchanging probability distributions.
We propose Multi-view Contrastive Learning (MCL) for OKD to implicitly capture correlations of feature embeddings encoded by multiple peer networks.
arXiv Detail & Related papers (2020-06-07T09:11:28Z) - Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation.
In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI.
We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.