Curriculum-scheduled Knowledge Distillation from Multiple Pre-trained Teachers for Multi-domain Sequential Recommendation
- URL: http://arxiv.org/abs/2401.00797v2
- Date: Tue, 15 Oct 2024 12:37:40 GMT
- Title: Curriculum-scheduled Knowledge Distillation from Multiple Pre-trained Teachers for Multi-domain Sequential Recommendation
- Authors: Wenqi Sun, Ruobing Xie, Junjie Zhang, Wayne Xin Zhao, Leyu Lin, Ji-Rong Wen,
- Abstract summary: It is essential to explore how to use different pre-trained recommendation models efficiently in real-world systems.
We propose a novel curriculum-scheduled knowledge distillation from multiple pre-trained teachers for multi-domain sequential recommendation.
CKD-MDSR takes full advantages of different PRMs as multiple teacher models to boost a small student recommendation model.
- Score: 102.91236882045021
- License:
- Abstract: Pre-trained recommendation models (PRMs) have received increasing interest recently. However, their intrinsically heterogeneous model structure, huge model size and computation cost hinder their adoptions in practical recommender systems. Hence, it is highly essential to explore how to use different pre-trained recommendation models efficiently in real-world systems. In this paper, we propose a novel curriculum-scheduled knowledge distillation from multiple pre-trained teachers for multi-domain sequential recommendation, called CKD-MDSR, which takes full advantages of different PRMs as multiple teacher models to boost a small student recommendation model, integrating the knowledge across multiple domains from PRMs. Specifically, CKD-MDSR first adopts curriculum-scheduled user behavior sequence sampling and distills informative knowledge jointly from the representative PRMs such as UniSRec and Recformer. Then, the knowledge from the above PRMs are selectively integrated into the student model in consideration of their confidence and consistency. Finally, we verify the proposed method on multi-domain sequential recommendation and further demonstrate its universality with multiple types of student models, including feature interaction and graph based recommendation models. Extensive experiments on five real-world datasets demonstrate the effectiveness and efficiency of CKD-MDSR, which can be viewed as an efficient shortcut using PRMs in real-world systems.
Related papers
- Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Contextual Distillation Model for Diversified Recommendation [19.136439564988834]
Contextual Distillation Model (CDM) is an efficient recommendation model that addresses diversification.
We propose a contrastive context encoder that employs attention mechanisms to model both positive and negative contexts.
During inference, ranking is performed through a linear combination of the recommendation and student model scores.
arXiv Detail & Related papers (2024-06-13T11:55:40Z) - ZhiJian: A Unifying and Rapidly Deployable Toolbox for Pre-trained Model
Reuse [59.500060790983994]
This paper introduces ZhiJian, a comprehensive and user-friendly toolbox for model reuse, utilizing the PyTorch backend.
ZhiJian presents a novel paradigm that unifies diverse perspectives on model reuse, encompassing target architecture construction with PTM, tuning target model with PTM, and PTM-based inference.
arXiv Detail & Related papers (2023-08-17T19:12:13Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Multiple Robust Learning for Recommendation [13.06593469196849]
In recommender systems, a common problem is the presence of various biases in the collected data.
We propose a multiple robust (MR) estimator that can take the advantage of multiple candidate imputation and propensity models to achieve unbiasedness.
arXiv Detail & Related papers (2022-07-09T13:15:56Z) - Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language
Models [37.88287077119201]
We propose a novel model reuse paradigm, Knowledge Amalgamation(KA) for PLMs.
Without human annotations available, KA aims to merge the knowledge from different teacher-PLMs, each of which specializes in a different classification problem, into a versatile student model.
Experimental results demonstrate that MUKA achieves substantial improvements over baselines on benchmark datasets.
arXiv Detail & Related papers (2021-12-14T12:26:24Z) - Scene-adaptive Knowledge Distillation for Sequential Recommendation via
Differentiable Architecture Search [19.798931417466456]
Sequential recommender systems (SRS) have become a research hotspot due to its power in modeling user dynamic interests and sequential behavioral patterns.
To maximize model expressive ability, a default choice is to apply a larger and deeper network architecture.
We propose AdaRec, a framework which compresses knowledge of a teacher model into a student model adaptively according to its recommendation scene.
arXiv Detail & Related papers (2021-07-15T07:47:46Z) - S^3-Rec: Self-Supervised Learning for Sequential Recommendation with
Mutual Information Maximization [104.87483578308526]
We propose the model S3-Rec, which stands for Self-Supervised learning for Sequential Recommendation.
For our task, we devise four auxiliary self-supervised objectives to learn the correlations among attribute, item, subsequence, and sequence.
Extensive experiments conducted on six real-world datasets demonstrate the superiority of our proposed method over existing state-of-the-art methods.
arXiv Detail & Related papers (2020-08-18T11:44:10Z) - MM-KTD: Multiple Model Kalman Temporal Differences for Reinforcement
Learning [36.14516028564416]
This paper proposes an innovative Multiple Model Kalman Temporal Difference (MM-KTD) framework to learn optimal control policies.
An active learning method is proposed to enhance the sampling efficiency of the system.
Experimental results show superiority of the MM-KTD framework in comparison to its state-of-the-art counterparts.
arXiv Detail & Related papers (2020-05-30T06:39:55Z) - Sequential Recommendation with Self-Attentive Multi-Adversarial Network [101.25533520688654]
We present a Multi-Factor Generative Adversarial Network (MFGAN) for explicitly modeling the effect of context information on sequential recommendation.
Our framework is flexible to incorporate multiple kinds of factor information, and is able to trace how each factor contributes to the recommendation decision over time.
arXiv Detail & Related papers (2020-05-21T12:28:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.