Introspective Learning by Distilling Knowledge from Online
Self-explanation
- URL: http://arxiv.org/abs/2009.09140v1
- Date: Sat, 19 Sep 2020 02:05:32 GMT
- Title: Introspective Learning by Distilling Knowledge from Online
Self-explanation
- Authors: Jindong Gu and Zhiliang Wu and Volker Tresp
- Abstract summary: We propose an implementation of introspective learning by distilling knowledge from online self-explanations.
The models trained with the introspective learning procedure outperform the ones trained with the standard learning procedure.
- Score: 36.91213895208838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In recent years, many explanation methods have been proposed to explain
individual classifications of deep neural networks. However, how to leverage
the created explanations to improve the learning process has been less
explored. As the privileged information, the explanations of a model can be
used to guide the learning process of the model itself. In the community,
another intensively investigated privileged information used to guide the
training of a model is the knowledge from a powerful teacher model. The goal of
this work is to leverage the self-explanation to improve the learning process
by borrowing ideas from knowledge distillation. We start by investigating the
effective components of the knowledge transferred from the teacher network to
the student network. Our investigation reveals that both the responses in
non-ground-truth classes and class-similarity information in teacher's outputs
contribute to the success of the knowledge distillation. Motivated by the
conclusion, we propose an implementation of introspective learning by
distilling knowledge from online self-explanations. The models trained with the
introspective learning procedure outperform the ones trained with the standard
learning procedure, as well as the ones trained with different regularization
methods. When compared to the models learned from peer networks or teacher
networks, our models also show competitive performance and requires neither
peers nor teachers.
Related papers
- Knowledge Distillation for Road Detection based on cross-model Semi-Supervised Learning [17.690698736544626]
We propose an integrated approach that combines knowledge distillation and semi-supervised learning methods.
This hybrid approach leverages the robust capabilities of large models to effectively utilise large unlabelled data.
The proposed semi-supervised learning-based knowledge distillation (SSLKD) approach demonstrates a notable improvement in the performance of the student model.
arXiv Detail & Related papers (2024-02-07T22:50:47Z) - Improved Knowledge Distillation for Pre-trained Language Models via
Knowledge Selection [35.515135913846386]
We propose an actor-critic approach to selecting appropriate knowledge to transfer during the process of knowledge distillation.
Experimental results on the GLUE datasets show that our method outperforms several strong knowledge distillation baselines significantly.
arXiv Detail & Related papers (2023-02-01T13:40:19Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Learning Data Teaching Strategies Via Knowledge Tracing [5.648636668261282]
We propose a novel method, called Knowledge Augmented Data Teaching (KADT), to optimize a data teaching strategy for a student model.
The KADT method incorporates a knowledge tracing model to dynamically capture the knowledge progress of a student model in terms of latent learning concepts.
We have evaluated the performance of the KADT method on four different machine learning tasks including knowledge tracing, sentiment analysis, movie recommendation, and image classification.
arXiv Detail & Related papers (2021-11-13T10:10:48Z) - Student Network Learning via Evolutionary Knowledge Distillation [22.030934154498205]
We propose an evolutionary knowledge distillation approach to improve the transfer effectiveness of teacher knowledge.
Instead of a fixed pre-trained teacher, an evolutionary teacher is learned online and consistently transfers intermediate knowledge to supervise student network learning on-the-fly.
In this way, the student can simultaneously obtain rich internal knowledge and capture its growth process, leading to effective student network learning.
arXiv Detail & Related papers (2021-03-23T02:07:15Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Collaborative Teacher-Student Learning via Multiple Knowledge Transfer [79.45526596053728]
We propose a collaborative teacher-student learning via multiple knowledge transfer (CTSL-MKT)
It allows multiple students learn knowledge from both individual instances and instance relations in a collaborative way.
The experiments and ablation studies on four image datasets demonstrate that the proposed CTSL-MKT significantly outperforms the state-of-the-art KD methods.
arXiv Detail & Related papers (2021-01-21T07:17:04Z) - Learning to Reweight with Deep Interactions [104.68509759134878]
We propose an improved data reweighting algorithm, in which the student model provides its internal states to the teacher model.
Experiments on image classification with clean/noisy labels and neural machine translation empirically demonstrate that our algorithm makes significant improvement over previous methods.
arXiv Detail & Related papers (2020-07-09T09:06:31Z) - Explainable Active Learning (XAL): An Empirical Study of How Local
Explanations Impact Annotator Experience [76.9910678786031]
We propose a novel paradigm of explainable active learning (XAL), by introducing techniques from the recently surging field of explainable AI (XAI) into an Active Learning setting.
Our study shows benefits of AI explanation as interfaces for machine teaching--supporting trust calibration and enabling rich forms of teaching feedback, and potential drawbacks--anchoring effect with the model judgment and cognitive workload.
arXiv Detail & Related papers (2020-01-24T22:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.