Ensemble Learning via Knowledge Transfer for CTR Prediction
- URL: http://arxiv.org/abs/2411.16122v1
- Date: Mon, 25 Nov 2024 06:14:20 GMT
- Title: Ensemble Learning via Knowledge Transfer for CTR Prediction
- Authors: Honghao Li, Yiwen Zhang, Yi Zhang, Lei Sang,
- Abstract summary: In this paper, we investigate larger ensemble networks and find three inherent limitations in commonly used ensemble learning method.
We propose a novel model-agnostic Ensemble Knowledge Transfer Framework (EKTF)
Experimental results on five real-world datasets demonstrate the effectiveness and compatibility of EKTF.
- Score: 9.891226177252653
- License:
- Abstract: Click-through rate (CTR) prediction plays a critical role in recommender systems and web searches. While many existing methods utilize ensemble learning to improve model performance, they typically limit the ensemble to two or three sub-networks, with little exploration of larger ensembles. In this paper, we investigate larger ensemble networks and find three inherent limitations in commonly used ensemble learning method: (1) performance degradation with more networks; (2) sharp decline and high variance in sub-network performance; (3) large discrepancies between sub-network and ensemble predictions. To simultaneously address the above limitations, this paper investigates potential solutions from the perspectives of Knowledge Distillation (KD) and Deep Mutual Learning (DML). Based on the empirical performance of these methods, we combine them to propose a novel model-agnostic Ensemble Knowledge Transfer Framework (EKTF). Specifically, we employ the collective decision-making of the students as an abstract teacher to guide each student (sub-network) towards more effective learning. Additionally, we encourage mutual learning among students to enable knowledge acquisition from different views. To address the issue of balancing the loss hyperparameters, we design a novel examination mechanism to ensure tailored teaching from teacher-to-student and selective learning in peer-to-peer. Experimental results on five real-world datasets demonstrate the effectiveness and compatibility of EKTF. The code, running logs, and detailed hyperparameter configurations are available at: https://github.com/salmon1802/EKTF.
Related papers
- Feature Interaction Fusion Self-Distillation Network For CTR Prediction [14.12775753361368]
Click-Through Rate (CTR) prediction plays a vital role in recommender systems, online advertising, and search engines.
We propose FSDNet, a CTR prediction framework incorporating a plug-and-play fusion self-distillation module.
arXiv Detail & Related papers (2024-11-12T03:05:03Z) - Leveraging Different Learning Styles for Improved Knowledge Distillation
in Biomedical Imaging [0.9208007322096533]
Our work endeavors to leverage the concept of knowledge diversification to improve the performance of model compression techniques like Knowledge Distillation (KD) and Mutual Learning (ML)
We use a single-teacher and two-student network in a unified framework that not only allows for the transfer of knowledge from teacher to students (KD) but also encourages collaborative learning between students (ML)
Unlike the conventional approach, where the teacher shares the same knowledge in the form of predictions or feature representations with the student network, our proposed approach employs a more diversified strategy by training one student with predictions and the other with feature maps from the teacher.
arXiv Detail & Related papers (2022-12-06T12:40:45Z) - Weakly Supervised Semantic Segmentation via Alternative Self-Dual
Teaching [82.71578668091914]
This paper establishes a compact learning framework that embeds the classification and mask-refinement components into a unified deep model.
We propose a novel alternative self-dual teaching (ASDT) mechanism to encourage high-quality knowledge interaction.
arXiv Detail & Related papers (2021-12-17T11:56:56Z) - Augmenting Knowledge Distillation With Peer-To-Peer Mutual Learning For
Model Compression [2.538209532048867]
Mutual Learning (ML) provides an alternative strategy where multiple simple student networks benefit from sharing knowledge.
We propose a single-teacher, multi-student framework that leverages both KD and ML to achieve better performance.
arXiv Detail & Related papers (2021-10-21T09:59:31Z) - Revisiting Knowledge Distillation: An Inheritance and Exploration
Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model.
We propose a novel inheritance and exploration knowledge distillation framework (IE-KD)
Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z) - LENAS: Learning-based Neural Architecture Search and Ensemble for 3D Radiotherapy Dose Prediction [42.38793195337463]
We propose a novel learning-based ensemble approach named LENAS, which integrates neural architecture search with knowledge distillation for 3D radiotherapy dose prediction.
Our approach starts by exhaustively searching each block from an enormous architecture space to identify multiple architectures that exhibit promising performance.
To mitigate the complexity introduced by the model ensemble, we adopt the teacher-student paradigm, leveraging the diverse outputs from multiple learned networks as supervisory signals.
arXiv Detail & Related papers (2021-06-12T10:08:52Z) - LANA: Towards Personalized Deep Knowledge Tracing Through
Distinguishable Interactive Sequences [21.67751919579854]
We propose Leveled Attentive KNowledge TrAcing (LANA) to predict students' responses to future questions.
It uses a novel student-related features extractor (SRFE) to distill students' unique inherent properties from their respective interactive sequences.
With pivot module reconstructed the decoder for individual students and leveled learning specialized encoders for groups, personalized DKT was achieved.
arXiv Detail & Related papers (2021-04-21T02:57:42Z) - Distilling Knowledge via Knowledge Review [69.15050871776552]
We study the factor of connection path cross levels between teacher and student networks, and reveal its great importance.
For the first time in knowledge distillation, cross-stage connection paths are proposed.
Our finally designed nested and compact framework requires negligible overhead, and outperforms other methods on a variety of tasks.
arXiv Detail & Related papers (2021-04-19T04:36:24Z) - BCFNet: A Balanced Collaborative Filtering Network with Attention
Mechanism [106.43103176833371]
Collaborative Filtering (CF) based recommendation methods have been widely studied.
We propose a novel recommendation model named Balanced Collaborative Filtering Network (BCFNet)
In addition, an attention mechanism is designed to better capture the hidden information within implicit feedback and strengthen the learning ability of the neural network.
arXiv Detail & Related papers (2021-03-10T14:59:23Z) - Efficient Crowd Counting via Structured Knowledge Transfer [122.30417437707759]
Crowd counting is an application-oriented task and its inference efficiency is crucial for real-world applications.
We propose a novel Structured Knowledge Transfer framework to generate a lightweight but still highly effective student network.
Our models obtain at least 6.5$times$ speed-up on an Nvidia 1080 GPU and even achieve state-of-the-art performance.
arXiv Detail & Related papers (2020-03-23T08:05:41Z) - Learning From Multiple Experts: Self-paced Knowledge Distillation for
Long-tailed Classification [106.08067870620218]
We propose a self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME)
We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model.
We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods.
arXiv Detail & Related papers (2020-01-06T12:57:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.