Leveraging Different Learning Styles for Improved Knowledge Distillation
in Biomedical Imaging
- URL: http://arxiv.org/abs/2212.02931v3
- Date: Wed, 22 Nov 2023 15:23:44 GMT
- Title: Leveraging Different Learning Styles for Improved Knowledge Distillation
in Biomedical Imaging
- Authors: Usma Niyaz, Abhishek Singh Sambyal, Deepti R. Bathula
- Abstract summary: Our work endeavors to leverage the concept of knowledge diversification to improve the performance of model compression techniques like Knowledge Distillation (KD) and Mutual Learning (ML)
We use a single-teacher and two-student network in a unified framework that not only allows for the transfer of knowledge from teacher to students (KD) but also encourages collaborative learning between students (ML)
Unlike the conventional approach, where the teacher shares the same knowledge in the form of predictions or feature representations with the student network, our proposed approach employs a more diversified strategy by training one student with predictions and the other with feature maps from the teacher.
- Score: 0.9208007322096533
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning style refers to a type of training mechanism adopted by an
individual to gain new knowledge. As suggested by the VARK model, humans have
different learning preferences, like Visual (V), Auditory (A), Read/Write (R),
and Kinesthetic (K), for acquiring and effectively processing information. Our
work endeavors to leverage this concept of knowledge diversification to improve
the performance of model compression techniques like Knowledge Distillation
(KD) and Mutual Learning (ML). Consequently, we use a single-teacher and
two-student network in a unified framework that not only allows for the
transfer of knowledge from teacher to students (KD) but also encourages
collaborative learning between students (ML). Unlike the conventional approach,
where the teacher shares the same knowledge in the form of predictions or
feature representations with the student network, our proposed approach employs
a more diversified strategy by training one student with predictions and the
other with feature maps from the teacher. We further extend this knowledge
diversification by facilitating the exchange of predictions and feature maps
between the two student networks, enriching their learning experiences. We have
conducted comprehensive experiments with three benchmark datasets for both
classification and segmentation tasks using two different network architecture
combinations. These experimental results demonstrate that knowledge
diversification in a combined KD and ML framework outperforms conventional KD
or ML techniques (with similar network configuration) that only use predictions
with an average improvement of 2%. Furthermore, consistent improvement in
performance across different tasks, with various network architectures, and
over state-of-the-art techniques establishes the robustness and
generalizability of the proposed model
Related papers
- Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - A Closer Look at Knowledge Distillation with Features, Logits, and
Gradients [81.39206923719455]
Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another.
This work provides a new perspective to motivate a set of knowledge distillation strategies by approximating the classical KL-divergence criteria with different knowledge sources.
Our analysis indicates that logits are generally a more efficient knowledge source and suggests that having sufficient feature dimensions is crucial for the model design.
arXiv Detail & Related papers (2022-03-18T21:26:55Z) - Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For
Model Compression [2.538209532048867]
Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge.
We propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance.
arXiv Detail & Related papers (2021-10-21T09:59:31Z) - Revisiting Knowledge Distillation: An Inheritance and Exploration
Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model.
We propose a novel inheritance and exploration knowledge distillation framework (IE-KD)
Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z) - LENAS: Learning-based Neural Architecture Search and Ensemble for 3D Radiotherapy Dose Prediction [42.38793195337463]
We propose a novel learning-based ensemble approach named LENAS, which integrates neural architecture search with knowledge distillation for 3D radiotherapy dose prediction.
Our approach starts by exhaustively searching each block from an enormous architecture space to identify multiple architectures that exhibit promising performance.
To mitigate the complexity introduced by the model ensemble, we adopt the teacher-student paradigm, leveraging the diverse outputs from multiple learned networks as supervisory signals.
arXiv Detail & Related papers (2021-06-12T10:08:52Z) - Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT)
It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way.
The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z) - Multi-level Knowledge Distillation [13.71183256776644]
We introduce Multi-level Knowledge Distillation (MLKD) to transfer richer representational knowledge from teacher to student networks.
MLKD employs three novel teacher-student similarities: individual similarity, relational similarity, and categorical similarity.
Experiments demonstrate that MLKD outperforms other state-of-the-art methods on both similar-architecture and cross-architecture tasks.
arXiv Detail & Related papers (2020-12-01T15:27:15Z) - Knowledge Distillation Beyond Model Compression [13.041607703862724]
Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or ensemble of models (teacher)
In this study, we provide an extensive study on nine different KD methods which covers a broad spectrum of approaches to capture and transfer knowledge.
arXiv Detail & Related papers (2020-07-03T19:54:04Z) - Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network.
We show that the seemingly different self-supervision task can serve as a simple yet powerful solution.
By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.