Distillation from Heterogeneous Models for Top-K Recommendation
- URL: http://arxiv.org/abs/2303.01130v1
- Date: Thu, 2 Mar 2023 10:23:50 GMT
- Title: Distillation from Heterogeneous Models for Top-K Recommendation
- Authors: SeongKu Kang, Wonbin Kweon, Dongha Lee, Jianxun Lian, Xing Xie, Hwanjo
Yu
- Abstract summary: HetComp is a framework that guides the student model by transferring sequences of knowledge from teachers' trajectories.
HetComp significantly improves the distillation quality and the generalization of the student model.
- Score: 43.83625440616829
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent recommender systems have shown remarkable performance by using an
ensemble of heterogeneous models. However, it is exceedingly costly because it
requires resources and inference latency proportional to the number of models,
which remains the bottleneck for production. Our work aims to transfer the
ensemble knowledge of heterogeneous teachers to a lightweight student model
using knowledge distillation (KD), to reduce the huge inference costs while
retaining high accuracy. Through an empirical study, we find that the efficacy
of distillation severely drops when transferring knowledge from heterogeneous
teachers. Nevertheless, we show that an important signal to ease the difficulty
can be obtained from the teacher's training trajectory. This paper proposes a
new KD framework, named HetComp, that guides the student model by transferring
easy-to-hard sequences of knowledge generated from the teachers' trajectories.
To provide guidance according to the student's learning state, HetComp uses
dynamic knowledge construction to provide progressively difficult ranking
knowledge and adaptive knowledge transfer to gradually transfer finer-grained
ranking information. Our comprehensive experiments show that HetComp
significantly improves the distillation quality and the generalization of the
student model.
Related papers
- Dynamic Guidance Adversarial Distillation with Enhanced Teacher Knowledge [17.382306203152943]
Dynamic Guidance Adversarial Distillation (DGAD) framework tackles the challenge of differential sample importance.
DGAD employs Misclassification-Aware Partitioning (MAP) to dynamically tailor the distillation focus.
Error-corrective Label Swapping (ELS) corrects misclassifications of the teacher on both clean and adversarially perturbed inputs.
arXiv Detail & Related papers (2024-09-03T05:52:37Z) - Robustness-Reinforced Knowledge Distillation with Correlation Distance
and Network Pruning [3.1423836318272773]
Knowledge distillation (KD) improves the performance of efficient and lightweight models.
Most existing KD techniques rely on Kullback-Leibler (KL) divergence.
We propose a Robustness-Reinforced Knowledge Distillation (R2KD) that leverages correlation distance and network pruning.
arXiv Detail & Related papers (2023-11-23T11:34:48Z) - Student-friendly Knowledge Distillation [1.5469452301122173]
We propose student-friendly knowledge distillation (SKD) to simplify teacher output into new knowledge representations.
SKD contains a softening processing and a learning simplifier.
The experimental results on the CIFAR-100 and ImageNet datasets show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2023-05-18T11:44:30Z) - Learning Knowledge Representation with Meta Knowledge Distillation for
Single Image Super-Resolution [82.89021683451432]
We propose a model-agnostic meta knowledge distillation method under the teacher-student architecture for the single image super-resolution task.
Experiments conducted on various single image super-resolution datasets demonstrate that our proposed method outperforms existing defined knowledge representation related distillation methods.
arXiv Detail & Related papers (2022-07-18T02:41:04Z) - Parameter-Efficient and Student-Friendly Knowledge Distillation [83.56365548607863]
We present a parameter-efficient and student-friendly knowledge distillation method, namely PESF-KD, to achieve efficient and sufficient knowledge transfer.
Experiments on a variety of benchmarks show that PESF-KD can significantly reduce the training cost while obtaining competitive results compared to advanced online distillation methods.
arXiv Detail & Related papers (2022-05-28T16:11:49Z) - On the benefits of knowledge distillation for adversarial robustness [53.41196727255314]
We show that knowledge distillation can be used directly to boost the performance of state-of-the-art models in adversarial robustness.
We present Adversarial Knowledge Distillation (AKD), a new framework to improve a model's robust performance.
arXiv Detail & Related papers (2022-03-14T15:02:13Z) - Dynamic Rectification Knowledge Distillation [0.0]
Dynamic Rectification Knowledge Distillation (DR-KD) is a knowledge distillation framework.
DR-KD transforms the student into its own teacher, and if the self-teacher makes wrong predictions while distilling information, the error is rectified prior to the knowledge being distilled.
Our proposed DR-KD performs remarkably well in the absence of a sophisticated cumbersome teacher model.
arXiv Detail & Related papers (2022-01-27T04:38:01Z) - Learning Student-Friendly Teacher Networks for Knowledge Distillation [50.11640959363315]
We propose a novel knowledge distillation approach to facilitate the transfer of dark knowledge from a teacher to a student.
Contrary to most of the existing methods that rely on effective training of student models given pretrained teachers, we aim to learn the teacher models that are friendly to students.
arXiv Detail & Related papers (2021-02-12T07:00:17Z) - Transfer Heterogeneous Knowledge Among Peer-to-Peer Teammates: A Model
Distillation Approach [55.83558520598304]
We propose a brand new solution to reuse experiences and transfer value functions among multiple students via model distillation.
We also describe how to design an efficient communication protocol to exploit heterogeneous knowledge.
Our proposed framework, namely Learning and Teaching Categorical Reinforcement, shows promising performance on stabilizing and accelerating learning progress.
arXiv Detail & Related papers (2020-02-06T11:31:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.