Multi-view Contrastive Learning for Online Knowledge Distillation
- URL: http://arxiv.org/abs/2006.04093v3
- Date: Sat, 10 Apr 2021 08:49:09 GMT
- Title: Multi-view Contrastive Learning for Online Knowledge Distillation
- Authors: Chuanguang Yang, Zhulin An, Yongjun Xu
- Abstract summary: Previous Online Knowledge Distillation (OKD) often carries out mutually exchanging probability distributions.
We propose Multi-view Contrastive Learning (MCL) for OKD to implicitly capture correlations of feature embeddings encoded by multiple peer networks.
- Score: 12.250230630124758
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Previous Online Knowledge Distillation (OKD) often carries out mutually
exchanging probability distributions, but neglects the useful representational
knowledge. We therefore propose Multi-view Contrastive Learning (MCL) for OKD
to implicitly capture correlations of feature embeddings encoded by multiple
peer networks, which provide various views for understanding the input data
instances. Benefiting from MCL, we can learn a more discriminative
representation space for classification than previous OKD methods. Experimental
results on image classification demonstrate that our MCL-OKD outperforms other
state-of-the-art OKD methods by large margins without sacrificing additional
inference cost. Codes are available at https://github.com/winycg/MCL-OKD.
Related papers
- Dynamic Evidence Decoupling for Trusted Multi-view Learning [17.029245880233816]
We propose a Consistent and Complementary-aware trusted Multi-view Learning (CCML) method to solve this problem.
We first construct view opinions using evidential deep neural networks, which consist of belief mass vectors and uncertainty estimates.
The results validate the effectiveness of the dynamic evidence decoupling strategy and show that CCML significantly outperforms baselines on accuracy and reliability.
arXiv Detail & Related papers (2024-10-04T03:27:51Z) - Invariant Causal Knowledge Distillation in Neural Networks [6.24302896438145]
In this paper, we introduce Invariant Consistency Distillation (ICD), a novel methodology designed to enhance knowledge distillation.
ICD ensures that the student model's representations are both discriminative and invariant with respect to the teacher's outputs.
Our results on CIFAR-100 and ImageNet ILSVRC-2012 show that ICD outperforms traditional KD techniques and surpasses state-of-the-art methods.
arXiv Detail & Related papers (2024-07-16T14:53:35Z) - Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching [53.05954114863596]
We propose a brand-new Deep Boosting Learning (DBL) algorithm for image-text matching.
An anchor branch is first trained to provide insights into the data properties.
A target branch is concurrently tasked with more adaptive margin constraints to further enlarge the relative distance between matched and unmatched samples.
arXiv Detail & Related papers (2024-04-28T08:44:28Z) - Data-free Knowledge Distillation for Fine-grained Visual Categorization [9.969720644789781]
We propose an approach called DFKD-FGVC that extends DFKD to fine-grained visual categorization(FGVC) tasks.
We evaluate our approach on three widely-used FGVC benchmarks (Aircraft, Cars196, and CUB200) and demonstrate its superior performance.
arXiv Detail & Related papers (2024-04-18T09:44:56Z) - Improving Deep Representation Learning via Auxiliary Learnable Target Coding [69.79343510578877]
This paper introduces a novel learnable target coding as an auxiliary regularization of deep representation learning.
Specifically, a margin-based triplet loss and a correlation consistency loss on the proposed target codes are designed to encourage more discriminative representations.
arXiv Detail & Related papers (2023-05-30T01:38:54Z) - Rank Flow Embedding for Unsupervised and Semi-Supervised Manifold
Learning [9.171175292808144]
We propose a novel manifold learning algorithm named Rank Flow Embedding (RFE) for unsupervised and semi-supervised scenarios.
RFE computes context-sensitive embeddings, which are refined following a rank-based processing flow.
The generated embeddings can be exploited for more effective unsupervised retrieval or semi-supervised classification.
arXiv Detail & Related papers (2023-04-24T21:02:12Z) - Vector Quantized Wasserstein Auto-Encoder [57.29764749855623]
We study learning deep discrete representations from the generative viewpoint.
We endow discrete distributions over sequences of codewords and learn a deterministic decoder that transports the distribution over the sequences of codewords to the data distribution.
We develop further theories to connect it with the clustering viewpoint of WS distance, allowing us to have a better and more controllable clustering solution.
arXiv Detail & Related papers (2023-02-12T13:51:36Z) - Modeling Multiple Views via Implicitly Preserving Global Consistency and
Local Complementarity [61.05259660910437]
We propose a global consistency and complementarity network (CoCoNet) to learn representations from multiple views.
On the global stage, we reckon that the crucial knowledge is implicitly shared among views, and enhancing the encoder to capture such knowledge can improve the discriminability of the learned representations.
Lastly on the local stage, we propose a complementarity-factor, which joints cross-view discriminative knowledge, and it guides the encoders to learn not only view-wise discriminability but also cross-view complementary information.
arXiv Detail & Related papers (2022-09-16T09:24:00Z) - Online Knowledge Distillation via Mutual Contrastive Learning for Visual
Recognition [27.326420185846327]
We present a Mutual Contrastive Learning (MCL) framework for online Knowledge Distillation (KD)
Our MCL can aggregate cross-network embedding information and maximize the lower bound to the mutual information between two networks.
Experiments on image classification and transfer learning to visual recognition tasks show that layer-wise MCL can lead to consistent performance gains.
arXiv Detail & Related papers (2022-07-23T13:39:01Z) - A Non-isotropic Probabilistic Take on Proxy-based Deep Metric Learning [49.999268109518255]
Proxy-based Deep Metric Learning learns by embedding images close to their class representatives (proxies)
In addition, proxy-based DML struggles to learn class-internal structures.
We introduce non-isotropic probabilistic proxy-based DML to address both issues.
arXiv Detail & Related papers (2022-07-08T09:34:57Z) - Variational Distillation for Multi-View Learning [104.17551354374821]
We design several variational information bottlenecks to exploit two key characteristics for multi-view representation learning.
Under rigorously theoretical guarantee, our approach enables IB to grasp the intrinsic correlation between observations and semantic labels.
arXiv Detail & Related papers (2022-06-20T03:09:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.