Related papers: Preference-Consistent Knowledge Distillation for Recommender System

Preference-Consistent Knowledge Distillation for Recommender System

URL: http://arxiv.org/abs/2311.04549v2
Date: Mon, 13 Jan 2025 09:19:53 GMT
Title: Preference-Consistent Knowledge Distillation for Recommender System
Authors: Zhangchi Zhu, Wei Zhang,
Abstract summary: We find that due to the lack of restrictions on projectors, the process of transferring user preferences will likely be interfered with.<n>We propose PCKD, which consists of two regularization terms for projectors.<n>We focus on items with high preference scores and significantly mitigate preference inconsistency, improving the performance of feature-based knowledge distillation.
Score: 4.1752785943044985
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Feature-based knowledge distillation has been applied to compress modern recommendation models, usually with projectors that align student (small) recommendation models' dimensions with teacher dimensions. However, existing studies have only focused on making the projected features (i.e., student features after projectors) similar to teacher features, overlooking investigating whether the user preference can be transferred to student features (i.e., student features before projectors) in this manner. In this paper, we find that due to the lack of restrictions on projectors, the process of transferring user preferences will likely be interfered with. We refer to this phenomenon as preference inconsistency. It greatly wastes the power of feature-based knowledge distillation. To mitigate preference inconsistency, we propose PCKD, which consists of two regularization terms for projectors. We also propose a hybrid method that combines the two regularization terms. We focus on items with high preference scores and significantly mitigate preference inconsistency, improving the performance of feature-based knowledge distillation. Extensive experiments on three public datasets and three backbones demonstrate the effectiveness of PCKD. The code of our method is provided in https://github.com/woriazzc/KDs.

Related papers

Dual Test-time Training for Out-of-distribution Recommender System [91.15209066874694]
We propose a novel Dual Test-Time-Training framework for OOD Recommendation, termed DT3OR. In DT3OR, we incorporate a model adaptation mechanism during the test-time phase to carefully update the recommendation model. To the best of our knowledge, this paper is the first work to address OOD recommendation via a test-time-training strategy.
arXiv Detail & Related papers (2024-07-22T13:27:51Z)
Understanding the Effects of Projectors in Knowledge Distillation [31.882356225974632]
Even if the student and the teacher have the same feature dimensions, adding a projector still helps to improve the distillation performance. This paper investigates the implicit role that projectors play but so far have been overlooked. Motivated by the positive effects of projectors, we propose a projector ensemble-based feature distillation method to further improve distillation performance.
arXiv Detail & Related papers (2023-10-26T06:30:39Z)
Improving Knowledge Distillation via Regularizing Feature Norm and Direction [16.98806338782858]
Knowledge distillation (KD) exploits a large well-trained model (i.e., teacher) to train a small student model on the same dataset for the same task. Treating teacher features as knowledge, prevailing methods of knowledge distillation train student by aligning its features with the teacher's, e.g., by minimizing the KL-divergence between their logits or L2 distance between their intermediate features. While it is natural to believe that better alignment of student features to the teacher better distills teacher knowledge, simply forcing this alignment does not directly contribute to the student's performance, e.g.
arXiv Detail & Related papers (2023-05-26T15:05:19Z)
Knowledge Diffusion for Distillation [53.908314960324915]
The representation gap between teacher and student is an emerging topic in knowledge distillation (KD) We state that the essence of these methods is to discard the noisy information and distill the valuable information in the feature. We propose a novel KD method dubbed DiffKD, to explicitly denoise and match features using diffusion models.
arXiv Detail & Related papers (2023-05-25T04:49:34Z)
Understanding the Role of the Projector in Knowledge Distillation [22.698845243751293]
We revisit the efficacy of knowledge distillation as a function matching and metric learning problem. We verify three important design decisions, namely the normalisation, soft maximum function, and projection layers. We attain a 77.2% top-1 accuracy with DeiT-Ti on ImageNet.
arXiv Detail & Related papers (2023-03-20T13:33:31Z)
Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems. However, RL approaches are intractable in the slate recommendation scenario. In that setting, an action corresponds to a slate that may contain any combination of items. In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z)
Unbiased Knowledge Distillation for Recommendation [66.82575287129728]
Knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency. Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge to supervise the learning of a compact student model. We find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation.
arXiv Detail & Related papers (2022-11-27T05:14:03Z)
Improved Feature Distillation via Projector Ensemble [40.86679028635297]
We propose a new feature distillation method based on a projector ensemble for further performance improvement. We observe that the student network benefits from a projector even if the feature dimensions of the student and the teacher are the same. We propose an ensemble of projectors to further improve the quality of student features.
arXiv Detail & Related papers (2022-10-27T09:08:40Z)
Exploring Inconsistent Knowledge Distillation for Object Detection with Data Augmentation [66.25738680429463]
Knowledge Distillation (KD) for object detection aims to train a compact detector by transferring knowledge from a teacher model. We propose inconsistent knowledge distillation (IKD) which aims to distill knowledge inherent in the teacher model's counter-intuitive perceptions. Our method outperforms state-of-the-art KD baselines on one-stage, two-stage and anchor-free object detectors.
arXiv Detail & Related papers (2022-09-20T16:36:28Z)
Better Teacher Better Student: Dynamic Prior Knowledge for Knowledge Distillation [70.92135839545314]
We propose the dynamic prior knowledge (DPK), which integrates part of teacher's features as the prior knowledge before the feature distillation. Our DPK makes the performance of the student model positively correlated with that of the teacher model, which means that we can further boost the accuracy of students by applying larger teachers.
arXiv Detail & Related papers (2022-06-13T11:52:13Z)
Knowledge Distillation with the Reused Teacher Classifier [31.22117343316628]
We show that a simple knowledge distillation technique is enough to significantly narrow down the teacher-student performance gap. Our technique achieves state-of-the-art results at the modest cost of compression ratio due to the added projector.
arXiv Detail & Related papers (2022-03-26T06:28:46Z)
Distilling Object Detectors with Task Adaptive Regularization [97.52935611385179]
Current state-of-the-art object detectors are at the expense of high computational costs and are hard to deploy to low-end devices. Knowledge distillation, which aims at training a smaller student network by transferring knowledge from a larger teacher model, is one of the promising solutions for model miniaturization.
arXiv Detail & Related papers (2020-06-23T15:58:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.