Scene-adaptive Knowledge Distillation for Sequential Recommendation via
Differentiable Architecture Search
- URL: http://arxiv.org/abs/2107.07173v1
- Date: Thu, 15 Jul 2021 07:47:46 GMT
- Title: Scene-adaptive Knowledge Distillation for Sequential Recommendation via
Differentiable Architecture Search
- Authors: Lei Chen, Fajie Yuan, Jiaxi Yang, Min Yang, and Chengming Li
- Abstract summary: Sequential recommender systems (SRS) have become a research hotspot due to its power in modeling user dynamic interests and sequential behavioral patterns.
To maximize model expressive ability, a default choice is to apply a larger and deeper network architecture.
We propose AdaRec, a framework which compresses knowledge of a teacher model into a student model adaptively according to its recommendation scene.
- Score: 19.798931417466456
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sequential recommender systems (SRS) have become a research hotspot due to
its power in modeling user dynamic interests and sequential behavioral
patterns. To maximize model expressive ability, a default choice is to apply a
larger and deeper network architecture, which, however, often brings high
network latency when generating online recommendations. Naturally, we argue
that compressing the heavy recommendation models into middle- or light- weight
neural networks is of great importance for practical production systems. To
realize such a goal, we propose AdaRec, a knowledge distillation (KD) framework
which compresses knowledge of a teacher model into a student model adaptively
according to its recommendation scene by using differentiable Neural
Architecture Search (NAS). Specifically, we introduce a target-oriented
distillation loss to guide the structure search process for finding the student
network architecture, and a cost-sensitive loss as constraints for model size,
which achieves a superior trade-off between recommendation effectiveness and
efficiency. In addition, we leverage Earth Mover's Distance (EMD) to realize
many-to-many layer mapping during knowledge distillation, which enables each
intermediate student layer to learn from other intermediate teacher layers
adaptively. Extensive experiments on real-world recommendation datasets
demonstrate that our model achieves competitive or better accuracy with notable
inference speedup comparing to strong counterparts, while discovering diverse
neural architectures for sequential recommender models under different
recommendation scenes.
Related papers
- Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - One-for-All: Bridge the Gap Between Heterogeneous Architectures in
Knowledge Distillation [69.65734716679925]
Knowledge distillation has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme.
Most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family.
We propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures.
arXiv Detail & Related papers (2023-10-30T11:13:02Z) - EmbedDistill: A Geometric Knowledge Distillation for Information
Retrieval [83.79667141681418]
Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR)
We propose a novel distillation approach that leverages the relative geometry among queries and documents learned by the large teacher model.
We show that our approach successfully distills from both dual-encoder (DE) and cross-encoder (CE) teacher models to 1/10th size asymmetric students that can retain 95-97% of the teacher performance.
arXiv Detail & Related papers (2023-01-27T22:04:37Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - Rethinking Pareto Frontier for Performance Evaluation of Deep Neural
Networks [2.167843405313757]
We re-define the efficiency measure using a multi-objective optimization.
We combine competing variables with nature simultaneously in a single relative efficiency measure.
This allows to rank deep models that run efficiently on different computing hardware, and combines inference efficiency with training efficiency objectively.
arXiv Detail & Related papers (2022-02-18T15:58:17Z) - Guided Sampling-based Evolutionary Deep Neural Network for Intelligent
Fault Diagnosis [8.92307560991779]
We have proposed a novel framework of evolutionary deep neural network which uses policy gradient to guide the evolution of model architecture.
The effectiveness of the proposed framework has been validated on three datasets.
arXiv Detail & Related papers (2021-11-12T18:59:45Z) - Follow Your Path: a Progressive Method for Knowledge Distillation [23.709919521355936]
We propose ProKT, a new model-agnostic method by projecting the supervision signals of a teacher model into the student's parameter space.
Experiments on both image and text datasets show that our proposed ProKT consistently achieves superior performance compared to other existing knowledge distillation methods.
arXiv Detail & Related papers (2021-07-20T07:44:33Z) - Hybrid Model with Time Modeling for Sequential Recommender Systems [0.15229257192293202]
Booking.com organized the WSDM WebTour 2021 Challenge, which aims to benchmark models to recommend the final city in a trip.
We conducted several experiments to test different state-of-the-art deep learning architectures for recommender systems.
Our experimental result shows that the improved NARM outperforms all other state-of-the-art benchmark methods.
arXiv Detail & Related papers (2021-03-07T19:28:22Z) - Self-Supervised Reinforcement Learning for Recommender Systems [77.38665506495553]
We propose self-supervised reinforcement learning for sequential recommendation tasks.
Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL.
Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning(SQN) and Self-Supervised Actor-Critic(SAC)
arXiv Detail & Related papers (2020-06-10T11:18:57Z) - Belief Propagation Reloaded: Learning BP-Layers for Labeling Problems [83.98774574197613]
We take one of the simplest inference methods, a truncated max-product Belief propagation, and add what is necessary to make it a proper component of a deep learning model.
This BP-Layer can be used as the final or an intermediate block in convolutional neural networks (CNNs)
The model is applicable to a range of dense prediction problems, is well-trainable and provides parameter-efficient and robust solutions in stereo, optical flow and semantic segmentation.
arXiv Detail & Related papers (2020-03-13T13:11:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.