Mutual Learning for Finetuning Click-Through Rate Prediction Models
- URL: http://arxiv.org/abs/2406.12087v1
- Date: Mon, 17 Jun 2024 20:56:30 GMT
- Title: Mutual Learning for Finetuning Click-Through Rate Prediction Models
- Authors: Ibrahim Can Yilmaz, Said Aldemir,
- Abstract summary: In this paper, we show how useful the mutual learning algorithm could be when it is between equals.
In our experiments on the Criteo and Avazu datasets, the mutual learning algorithm improved the performance of the model by up to 0.66% relative improvement.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Click-Through Rate (CTR) prediction has become an essential task in digital industries, such as digital advertising or online shopping. Many deep learning-based methods have been implemented and have become state-of-the-art models in the domain. To further improve the performance of CTR models, Knowledge Distillation based approaches have been widely used. However, most of the current CTR prediction models do not have much complex architectures, so it's hard to call one of them 'cumbersome' and the other one 'tiny'. On the other hand, the performance gap is also not very large between complex and simple models. So, distilling knowledge from one model to the other could not be worth the effort. Under these considerations, Mutual Learning could be a better approach, since all the models could be improved mutually. In this paper, we showed how useful the mutual learning algorithm could be when it is between equals. In our experiments on the Criteo and Avazu datasets, the mutual learning algorithm improved the performance of the model by up to 0.66% relative improvement.
Related papers
- Leave No Knowledge Behind During Knowledge Distillation: Towards Practical and Effective Knowledge Distillation for Code-Switching ASR Using Realistic Data [42.65516074428803]
We present a framework for developing more efficient models for automatic speech recognition using realistic speech-only data.
Our proposed method, Leave No Knowledge Behind During Knowledge Distillation (K$2$D), leverages both the teacher model's knowledge and additional insights from a small auxiliary model.
We have successfully obtained a 2-time smaller model with 5-time faster generation speed while outperforming the baseline methods and the teacher model on all the testing sets.
arXiv Detail & Related papers (2024-07-15T10:25:14Z) - Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment [65.15914284008973]
We propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model.
We show that the proposed algorithms converge to the stationary solutions of the IRL problem.
Our results indicate that it is beneficial to leverage reward learning throughout the entire alignment process.
arXiv Detail & Related papers (2024-05-28T07:11:05Z) - Co-training and Co-distillation for Quality Improvement and Compression
of Language Models [88.94539115180919]
Knowledge Distillation (KD) compresses expensive pre-trained language models (PLMs) by transferring their knowledge to smaller models.
Most smaller models fail to surpass the performance of the original larger model, resulting in sacrificing performance to improve inference speed.
We propose Co-Training and Co-Distillation (CTCD), a novel framework that improves performance and inference speed together by co-training two models.
arXiv Detail & Related papers (2023-11-06T03:29:00Z) - Directed Acyclic Graph Factorization Machines for CTR Prediction via
Knowledge Distillation [65.62538699160085]
We propose a Directed Acyclic Graph Factorization Machine (KD-DAGFM) to learn the high-order feature interactions from existing complex interaction models for CTR prediction via Knowledge Distillation.
KD-DAGFM achieves the best performance with less than 21.5% FLOPs of the state-of-the-art method on both online and offline experiments.
arXiv Detail & Related papers (2022-11-21T03:09:42Z) - FedKD: Communication Efficient Federated Learning via Knowledge
Distillation [56.886414139084216]
Federated learning is widely used to learn intelligent models from decentralized data.
In federated learning, clients need to communicate their local model updates in each iteration of model learning.
We propose a communication efficient federated learning method based on knowledge distillation.
arXiv Detail & Related papers (2021-08-30T15:39:54Z) - Revisiting Knowledge Distillation: An Inheritance and Exploration
Framework [153.73692961660964]
Knowledge Distillation (KD) is a popular technique to transfer knowledge from a teacher model to a student model.
We propose a novel inheritance and exploration knowledge distillation framework (IE-KD)
Our IE-KD framework is generic and can be easily combined with existing distillation or mutual learning methods for training deep neural networks.
arXiv Detail & Related papers (2021-07-01T02:20:56Z) - Distill on the Go: Online knowledge distillation in self-supervised
learning [1.1470070927586016]
Recent works have shown that wider and deeper models benefit more from self-supervised learning than smaller models.
We propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation.
Our results show significant performance gain in the presence of noisy and limited labels.
arXiv Detail & Related papers (2021-04-20T09:59:23Z) - Interleaving Learning, with Application to Neural Architecture Search [12.317568257671427]
We propose a novel machine learning framework referred to as interleaving learning (IL)
In our framework, a set of models collaboratively learn a data encoder in an interleaving fashion.
We apply interleaving learning to search neural architectures for image classification on CIFAR-10, CIFAR-100, and ImageNet.
arXiv Detail & Related papers (2021-03-12T00:54:22Z) - Ensemble Knowledge Distillation for CTR Prediction [46.92149090885551]
We propose a new model training strategy based on knowledge distillation (KD)
KD is a teacher-student learning framework to transfer knowledge learned from a teacher model to a student model.
We propose some novel techniques to facilitate ensembled CTR prediction, including teacher gating and early stopping by distillation loss.
arXiv Detail & Related papers (2020-11-08T23:37:58Z) - Mutual Modality Learning for Video Action Classification [74.83718206963579]
We show how to embed multi-modality into a single model for video action classification.
We achieve state-of-the-art results in the Something-Something-v2 benchmark.
arXiv Detail & Related papers (2020-11-04T21:20:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.