DE-RRD: A Knowledge Distillation Framework for Recommender System
- URL: http://arxiv.org/abs/2012.04357v1
- Date: Tue, 8 Dec 2020 11:09:22 GMT
- Title: DE-RRD: A Knowledge Distillation Framework for Recommender System
- Authors: SeongKu Kang, Junyoung Hwang, Wonbin Kweon, Hwanjo Yu
- Abstract summary: We propose a knowledge distillation framework for recommender system, called DE-RRD.
It enables the student model to learn from the latent knowledge encoded in the teacher model as well as from the teacher's predictions.
Our experiments show that DE-RRD outperforms the state-of-the-art competitors and achieves comparable or even better performance to that of the teacher model with faster inference time.
- Score: 16.62204445256007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent recommender systems have started to employ knowledge distillation,
which is a model compression technique distilling knowledge from a cumbersome
model (teacher) to a compact model (student), to reduce inference latency while
maintaining performance. The state-of-the-art methods have only focused on
making the student model accurately imitate the predictions of the teacher
model. They have a limitation in that the prediction results incompletely
reveal the teacher's knowledge. In this paper, we propose a novel knowledge
distillation framework for recommender system, called DE-RRD, which enables the
student model to learn from the latent knowledge encoded in the teacher model
as well as from the teacher's predictions. Concretely, DE-RRD consists of two
methods: 1) Distillation Experts (DE) that directly transfers the latent
knowledge from the teacher model. DE exploits "experts" and a novel expert
selection strategy for effectively distilling the vast teacher's knowledge to
the student with limited capacity. 2) Relaxed Ranking Distillation (RRD) that
transfers the knowledge revealed from the teacher's prediction with
consideration of the relaxed ranking orders among items. Our extensive
experiments show that DE-RRD outperforms the state-of-the-art competitors and
achieves comparable or even better performance to that of the teacher model
with faster inference time.
Related papers
- Improve Knowledge Distillation via Label Revision and Data Selection [37.74822443555646]
This paper proposes to rectify the teacher's inaccurate predictions using the ground truth.
In the latter, we introduce a data selection technique to choose suitable training samples to be supervised by the teacher.
Experiment results demonstrate the effectiveness of our proposed method, and show that our method can be combined with other distillation approaches.
arXiv Detail & Related papers (2024-04-03T02:41:16Z) - Comparative Knowledge Distillation [102.35425896967791]
Traditional Knowledge Distillation (KD) assumes readily available access to teacher models for frequent inference.
We propose Comparative Knowledge Distillation (CKD), which encourages student models to understand the nuanced differences in a teacher model's interpretations of samples.
CKD consistently outperforms state of the art data augmentation and KD techniques.
arXiv Detail & Related papers (2023-11-03T21:55:33Z) - AD-KD: Attribution-Driven Knowledge Distillation for Language Model
Compression [26.474962405945316]
We present a novel attribution-driven knowledge distillation approach to compress pre-trained language models.
To enhance the knowledge transfer of model reasoning and generalization, we explore multi-view attribution distillation on all potential decisions of the teacher.
arXiv Detail & Related papers (2023-05-17T07:40:12Z) - Supervision Complexity and its Role in Knowledge Distillation [65.07910515406209]
We study the generalization behavior of a distilled student.
The framework highlights a delicate interplay among the teacher's accuracy, the student's margin with respect to the teacher predictions, and the complexity of the teacher predictions.
We demonstrate efficacy of online distillation and validate the theoretical findings on a range of image classification benchmarks and model architectures.
arXiv Detail & Related papers (2023-01-28T16:34:47Z) - Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z) - Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge
Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation.
Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - Dynamic Rectification Knowledge Distillation [0.0]
Dynamic Rectification Knowledge Distillation (DR-KD) is a knowledge distillation framework.
DR-KD transforms the student into its own teacher, and if the self-teacher makes wrong predictions while distilling information, the error is rectified prior to the knowledge being distilled.
Our proposed DR-KD performs remarkably well in the absence of a sophisticated cumbersome teacher model.
arXiv Detail & Related papers (2022-01-27T04:38:01Z) - Learning Interpretation with Explainable Knowledge Distillation [28.00216413365036]
Knowledge Distillation (KD) has been considered as a key solution in model compression and acceleration in recent years.
We propose a novel explainable knowledge distillation model, called XDistillation, through which both the performance the explanations' information are transferred from the teacher model to the student model.
Our experiments shows that models trained by XDistillation outperform those trained by conventional KD methods in term of predictive accuracy and also faithfulness to the teacher models.
arXiv Detail & Related papers (2021-11-12T21:18:06Z) - Dual Correction Strategy for Ranking Distillation in Top-N Recommender System [22.37864671297929]
This paper presents Dual Correction strategy for Knowledge Distillation (DCD)
DCD transfers the ranking information from the teacher model to the student model in a more efficient manner.
Our experiments show that the proposed method outperforms the state-of-the-art baselines.
arXiv Detail & Related papers (2021-09-08T07:00:45Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.