Related papers: Leveraging Different Learning Styles for Improved Knowledge Distillation in Biomedical Imaging

Leveraging Different Learning Styles for Improved Knowledge Distillation in Biomedical Imaging

URL: http://arxiv.org/abs/2212.02931v3
Date: Wed, 22 Nov 2023 15:23:44 GMT
Title: Leveraging Different Learning Styles for Improved Knowledge Distillation in Biomedical Imaging
Authors: Usma Niyaz, Abhishek Singh Sambyal, Deepti R. Bathula
Abstract summary: Our work endeavors to leverage the concept of knowledge diversification to improve the performance of model compression techniques like Knowledge Distillation (KD) and Mutual Learning (ML) We use a single-teacher and two-student network in a unified framework that not only allows for the transfer of knowledge from teacher to students (KD) but also encourages collaborative learning between students (ML) Unlike the conventional approach, where the teacher shares the same knowledge in the form of predictions or feature representations with the student network, our proposed approach employs a more diversified strategy by training one student with predictions and the other with feature maps from the teacher.
Score: 0.9208007322096533
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning style refers to a type of training mechanism adopted by an individual to gain new knowledge. As suggested by the VARK model, humans have different learning preferences, like Visual (V), Auditory (A), Read/Write (R), and Kinesthetic (K), for acquiring and effectively processing information. Our work endeavors to leverage this concept of knowledge diversification to improve the performance of model compression techniques like Knowledge Distillation (KD) and Mutual Learning (ML). Consequently, we use a single-teacher and two-student network in a unified framework that not only allows for the transfer of knowledge from teacher to students (KD) but also encourages collaborative learning between students (ML). Unlike the conventional approach, where the teacher shares the same knowledge in the form of predictions or feature representations with the student network, our proposed approach employs a more diversified strategy by training one student with predictions and the other with feature maps from the teacher. We further extend this knowledge diversification by facilitating the exchange of predictions and feature maps between the two student networks, enriching their learning experiences. We have conducted comprehensive experiments with three benchmark datasets for both classification and segmentation tasks using two different network architecture combinations. These experimental results demonstrate that knowledge diversification in a combined KD and ML framework outperforms conventional KD or ML techniques (with similar network configuration) that only use predictions with an average improvement of 2%. Furthermore, consistent improvement in performance across different tasks, with various network architectures, and over state-of-the-art techniques establishes the robustness and generalizability of the proposed model

Related papers

Ensemble Learning via Knowledge Transfer for CTR Prediction [9.891226177252653]
In this paper, we investigate larger ensemble networks and find three inherent limitations in commonly used ensemble learning method. We propose a novel model-agnostic Ensemble Knowledge Transfer Framework (EKTF) Experimental results on five real-world datasets demonstrate the effectiveness and compatibility of EKTF.
arXiv Detail & Related papers (2024-11-25T06:14:20Z)
Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain. It tackles the problem from two aspects: extracting knowledge and memorizing knowledge. It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z)
A Closer Look at Knowledge Distillation with Features, Logits, and Gradients [81.39206923719455]
Knowledge distillation (KD) is a substantial strategy for transferring learned knowledge from one neural network model to another. This work provides a new perspective to motivate a set of knowledge distillation strategies by approximating the classical KL-divergence criteria with different knowledge sources. Our analysis indicates that logits are generally a more efficient knowledge source and suggests that having sufficient feature dimensions is crucial for the model design.
arXiv Detail & Related papers (2022-03-18T21:26:55Z)
Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For Model Compression [2.538209532048867]
Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge. We propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance.
arXiv Detail & Related papers (2021-10-21T09:59:31Z)
Revisiting Knowledge Distillation: An Inheritance and Exploration Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model. We propose a novel inheritance and exploration knowledge distillation framework (IE-KD) Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z)
Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT) It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way. The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z)
Multi-level Knowledge Distillation [13.71183256776644]
We introduce Multi-level Knowledge Distillation (MLKD) to transfer richer representational knowledge from teacher to student networks. MLKD employs three novel teacher-student similarities: individual similarity, relational similarity, and categorical similarity. Experiments demonstrate that MLKD outperforms other state-of-the-art methods on both similar-architecture and cross-architecture tasks.
arXiv Detail & Related papers (2020-12-01T15:27:15Z)
Knowledge Distillation Beyond Model Compression [13.041607703862724]
Knowledge distillation (KD) is commonly deemed as an effective model compression technique in which a compact model (student) is trained under the supervision of a larger pretrained model or ensemble of models (teacher) In this study, we provide an extensive study on nine different KD methods which covers a broad spectrum of approaches to capture and transfer knowledge.
arXiv Detail & Related papers (2020-07-03T19:54:04Z)
Knowledge Distillation Meets Self-Supervision [109.6400639148393]
Knowledge distillation involves extracting "dark knowledge" from a teacher network to guide the learning of a student network. We show that the seemingly different self-supervision task can serve as a simple yet powerful solution. By exploiting the similarity between those self-supervision signals as an auxiliary task, one can effectively transfer the hidden information from the teacher to the student.
arXiv Detail & Related papers (2020-06-12T12:18:52Z)
Learning From Multiple Experts: Self-paced Knowledge Distillation for Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME) We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model. We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.