Contrastive Knowledge Amalgamation for Unsupervised Image Classification
- URL: http://arxiv.org/abs/2307.14781v1
- Date: Thu, 27 Jul 2023 11:21:14 GMT
- Title: Contrastive Knowledge Amalgamation for Unsupervised Image Classification
- Authors: Shangde Gao, Yichao Fu, Ke Liu, Yuqiang Han
- Abstract summary: Contrastive Knowledge Amalgamation (CKA) aims to learn a compact student model to handle the joint objective from multiple teacher models.
Contrastive losses intra- and inter- models are designed to widen the distance between representations of different classes.
The alignment loss is introduced to minimize the sample-level distribution differences of teacher-student models in the common representation space.
- Score: 2.6392087010521728
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Knowledge amalgamation (KA) aims to learn a compact student model to handle
the joint objective from multiple teacher models that are are specialized for
their own tasks respectively. Current methods focus on coarsely aligning
teachers and students in the common representation space, making it difficult
for the student to learn the proper decision boundaries from a set of
heterogeneous teachers. Besides, the KL divergence in previous works only
minimizes the probability distribution difference between teachers and the
student, ignoring the intrinsic characteristics of teachers. Therefore, we
propose a novel Contrastive Knowledge Amalgamation (CKA) framework, which
introduces contrastive losses and an alignment loss to achieve intra-class
cohesion and inter-class separation.Contrastive losses intra- and inter- models
are designed to widen the distance between representations of different
classes. The alignment loss is introduced to minimize the sample-level
distribution differences of teacher-student models in the common representation
space.Furthermore, the student learns heterogeneous unsupervised classification
tasks through soft targets efficiently and flexibly in the task-level
amalgamation. Extensive experiments on benchmarks demonstrate the
generalization capability of CKA in the amalgamation of specific task as well
as multiple tasks. Comprehensive ablation studies provide a further insight
into our CKA.
Related papers
- TAS: Distilling Arbitrary Teacher and Student via a Hybrid Assistant [52.0297393822012]
We introduce an assistant model as a bridge to facilitate smooth feature knowledge transfer between heterogeneous teachers and students.
Within our proposed design principle, the assistant model combines the advantages of cross-architecture inductive biases and module functions.
Our proposed method is evaluated across some homogeneous model pairs and arbitrary heterogeneous combinations of CNNs, ViTs, spatial KDs.
arXiv Detail & Related papers (2024-10-16T08:02:49Z) - Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling [81.00825302340984]
We introduce Speculative Knowledge Distillation (SKD) to generate high-quality training data on-the-fly.
In SKD, the student proposes tokens, and the teacher replaces poorly ranked ones based on its own distribution.
We evaluate SKD on various text generation tasks, including translation, summarization, math, and instruction following.
arXiv Detail & Related papers (2024-10-15T06:51:25Z) - Competitive Ensembling Teacher-Student Framework for Semi-Supervised
Left Atrium MRI Segmentation [8.338801567668233]
Semi-supervised learning has greatly advanced medical image segmentation since it effectively alleviates the need of acquiring abundant annotations from experts.
In this paper, we present a simple yet efficient competitive ensembling teacher student framework for semi-supervised for left atrium segmentation from 3D MR images.
arXiv Detail & Related papers (2023-10-21T09:23:34Z) - Hybrid Distillation: Connecting Masked Autoencoders with Contrastive
Learners [102.20090188997301]
We explore how to obtain a model that combines Contrastive Learning (CL) and Masked Image Modeling (MIM) strengths.
In order to better obtain both discrimination and diversity, we propose a simple but effective Hybrid Distillation strategy.
Experiment results prove that Hybrid Distill can achieve superior performance on different benchmarks.
arXiv Detail & Related papers (2023-06-28T02:19:35Z) - Knowledge Distillation from A Stronger Teacher [44.11781464210916]
This paper presents a method dubbed DIST to distill better from a stronger teacher.
We empirically find that the discrepancy of predictions between the student and a stronger teacher may tend to be fairly severer.
Our method is simple yet practical, and extensive experiments demonstrate that it adapts well to various architectures.
arXiv Detail & Related papers (2022-05-21T08:30:58Z) - Generalized Knowledge Distillation via Relationship Matching [53.69235109551099]
Knowledge of a well-trained deep neural network (a.k.a. the "teacher") is valuable for learning similar tasks.
Knowledge distillation extracts knowledge from the teacher and integrates it with the target model.
Instead of enforcing the teacher to work on the same task as the student, we borrow the knowledge from a teacher trained from a general label space.
arXiv Detail & Related papers (2022-05-04T06:49:47Z) - Faculty Distillation with Optimal Transport [53.69235109551099]
We propose to link teacher's task and student's task by optimal transport.
Based on the semantic relationship between their label spaces, we can bridge the support gap between output distributions.
Experiments under various settings demonstrate the succinctness and versatility of our method.
arXiv Detail & Related papers (2022-04-25T09:34:37Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - Complementary Calibration: Boosting General Continual Learning with
Collaborative Distillation and Self-Supervision [47.374412281270594]
General Continual Learning (GCL) aims at learning from non independent and identically distributed stream data.
We reveal that the relation and feature deviations are crucial problems for catastrophic forgetting.
We propose a Complementary (CoCa) framework by mining the complementary model's outputs and features.
arXiv Detail & Related papers (2021-09-03T06:35:27Z) - Active Imitation Learning from Multiple Non-Deterministic Teachers:
Formulation, Challenges, and Algorithms [3.6702509833426613]
We formulate the problem of learning to imitate multiple, non-deterministic teachers with minimal interaction cost.
We first present a general framework that efficiently models and estimates such a distribution by learning continuous representations of the teacher policies.
Next, we develop Active Performance-Based Imitation Learning (APIL), an active learning algorithm for reducing the learner-teacher interaction cost.
arXiv Detail & Related papers (2020-06-14T03:06:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.