Unbiased Knowledge Distillation for Recommendation
- URL: http://arxiv.org/abs/2211.14729v1
- Date: Sun, 27 Nov 2022 05:14:03 GMT
- Title: Unbiased Knowledge Distillation for Recommendation
- Authors: Gang Chen, Jiawei Chen, Fuli Feng, Sheng Zhou, Xiangnan He
- Abstract summary: Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model.
We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
- Score: 66.82575287129728
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As a promising solution for model compression, knowledge distillation (KD)
has been applied in recommender systems (RS) to reduce inference latency.
Traditional solutions first train a full teacher model from the training data,
and then transfer its knowledge (\ie \textit{soft labels}) to supervise the
learning of a compact student model. However, we find such a standard
distillation paradigm would incur serious bias issue -- popular items are more
heavily recommended after the distillation. This effect prevents the student
model from making accurate and fair recommendations, decreasing the
effectiveness of RS.
In this work, we identify the origin of the bias in KD -- it roots in the
biased soft labels from the teacher, and is further propagated and intensified
during the distillation. To rectify this, we propose a new KD method with a
stratified distillation strategy. It first partitions items into multiple
groups according to their popularity, and then extracts the ranking knowledge
within each group to supervise the learning of the student. Our method is
simple and teacher-agnostic -- it works on distillation stage without affecting
the training of the teacher model. We conduct extensive theoretical and
empirical studies to validate the effectiveness of our proposal. We release our
code at: https://github.com/chengang95/UnKD.
Related papers
- Knowledge Distillation with Refined Logits [31.205248790623703]
We introduce Refined Logit Distillation (RLD) to address the limitations of current logit distillation methods.
Our approach is motivated by the observation that even high-performing teacher models can make incorrect predictions.
Our method can effectively eliminate misleading information from the teacher while preserving crucial class correlations.
arXiv Detail & Related papers (2024-08-14T17:59:32Z) - Improve Knowledge Distillation via Label Revision and Data Selection [37.74822443555646]
This paper proposes to rectify the teacher's inaccurate predictions using the ground truth.
In the latter, we introduce a data selection technique to choose suitable training samples to be supervised by the teacher.
Experiment results demonstrate the effectiveness of our proposed method, and show that our method can be combined with other distillation approaches.
arXiv Detail & Related papers (2024-04-03T02:41:16Z) - Triplet Knowledge Distillation [73.39109022280878]
In Knowledge Distillation, the teacher is generally much larger than the student, making the solution of the teacher likely to be difficult for the student to learn.
To ease the mimicking difficulty, we introduce a triplet knowledge distillation mechanism named TriKD.
arXiv Detail & Related papers (2023-05-25T12:12:31Z) - Robust Knowledge Distillation from RNN-T Models With Noisy Training
Labels Using Full-Sum Loss [32.816725317261934]
This work studies knowledge distillation (KD) and addresses its constraints for recurrent neural network transducer (RNN-T) models.
We show that a sequence-level KD, full-sum distillation, outperforms other distillation methods for RNN-T models.
We also propose a variant of full-sum distillation that distills the sequence discriminative knowledge of the teacher leading to further improvement in WER.
arXiv Detail & Related papers (2023-03-10T14:46:23Z) - Exploring Inconsistent Knowledge Distillation for Object Detection with
Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model.
We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions.
Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z) - Dual Correction Strategy for Ranking Distillation in Top-N Recommender System [22.37864671297929]
This paper presents Dual Correction strategy for Knowledge Distillation (DCD)
DCD transfers the ranking information from the teacher model to the student model in a more efficient manner.
Our experiments show that the proposed method outperforms the state-of-the-art baselines.
arXiv Detail & Related papers (2021-09-08T07:00:45Z) - Undistillable: Making A Nasty Teacher That CANNOT teach students [84.6111281091602]
This paper introduces and investigates a concept called Nasty Teacher: a specially trained teacher network that yields nearly the same performance as a normal one.
We propose a simple yet effective algorithm to build the nasty teacher, called self-undermining knowledge distillation.
arXiv Detail & Related papers (2021-05-16T08:41:30Z) - Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices.
Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z) - Knowledge distillation via adaptive instance normalization [52.91164959767517]
We propose a new knowledge distillation method based on transferring feature statistics from the teacher to the student.
Our method goes beyond the standard way of enforcing the mean and variance of the student to be similar to those of the teacher.
We show that our distillation method outperforms other state-of-the-art distillation methods over a large set of experimental settings.
arXiv Detail & Related papers (2020-03-09T17:50:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.