Towards Understanding the Overfitting Phenomenon of Deep Click-Through
Rate Prediction Models
- URL: http://arxiv.org/abs/2209.06053v1
- Date: Sun, 4 Sep 2022 11:36:16 GMT
- Title: Towards Understanding the Overfitting Phenomenon of Deep Click-Through
Rate Prediction Models
- Authors: Zhao-Yu Zhang, Xiang-Rong Sheng, Yujing Zhang, Biye Jiang, Shuguang
Han, Hongbo Deng, Bo Zheng
- Abstract summary: We observe an interesting one-epoch overfitting problem in Click-Through Rate (CTR) prediction.
The model performance exhibits a dramatic degradation at the beginning of the second epoch.
Thereby, the best performance is usually achieved by training with only one epoch.
- Score: 16.984947259260878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning techniques have been applied widely in industrial
recommendation systems. However, far less attention has been paid to the
overfitting problem of models in recommendation systems, which, on the
contrary, is recognized as a critical issue for deep neural networks. In the
context of Click-Through Rate (CTR) prediction, we observe an interesting
one-epoch overfitting problem: the model performance exhibits a dramatic
degradation at the beginning of the second epoch. Such a phenomenon has been
witnessed widely in real-world applications of CTR models. Thereby, the best
performance is usually achieved by training with only one epoch. To understand
the underlying factors behind the one-epoch phenomenon, we conduct extensive
experiments on the production data set collected from the display advertising
system of Alibaba. The results show that the model structure, the optimization
algorithm with a fast convergence rate, and the feature sparsity are closely
related to the one-epoch phenomenon. We also provide a likely hypothesis for
explaining such a phenomenon and conduct a set of proof-of-concept experiments.
We hope this work can shed light on future research on training more epochs for
better performance.
Related papers
- Data Attribution for Diffusion Models: Timestep-induced Bias in Influence Estimation [53.27596811146316]
Diffusion models operate over a sequence of timesteps instead of instantaneous input-output relationships in previous contexts.
We present Diffusion-TracIn that incorporates this temporal dynamics and observe that samples' loss gradient norms are highly dependent on timestep.
We introduce Diffusion-ReTrac as a re-normalized adaptation that enables the retrieval of training samples more targeted to the test sample of interest.
arXiv Detail & Related papers (2024-01-17T07:58:18Z) - Visual Prompting Upgrades Neural Network Sparsification: A Data-Model Perspective [64.04617968947697]
We introduce a novel data-model co-design perspective: to promote superior weight sparsity.
Specifically, customized Visual Prompts are mounted to upgrade neural Network sparsification in our proposed VPNs framework.
arXiv Detail & Related papers (2023-12-03T13:50:24Z) - A PAC-Bayesian Perspective on the Interpolating Information Criterion [54.548058449535155]
We show how a PAC-Bayes bound is obtained for a general class of models, characterizing factors which influence performance in the interpolating regime.
We quantify how the test error for overparameterized models achieving effectively zero training error depends on the quality of the implicit regularization imposed by e.g. the combination of model, parameter-initialization scheme.
arXiv Detail & Related papers (2023-11-13T01:48:08Z) - Robustness and Generalization Performance of Deep Learning Models on
Cyber-Physical Systems: A Comparative Study [71.84852429039881]
Investigation focuses on the models' ability to handle a range of perturbations, such as sensor faults and noise.
We test the generalization and transfer learning capabilities of these models by exposing them to out-of-distribution (OOD) samples.
arXiv Detail & Related papers (2023-06-13T12:43:59Z) - Multi-Epoch Learning for Deep Click-Through Rate Prediction Models [32.80864867251999]
The one-epoch overfitting phenomenon has been widely observed in industrial Click-Through Rate (CTR) applications.
We propose a novel Multi-Epoch learning with Data Augmentation (MEDA), which can be directly applied to most deep CTR models.
arXiv Detail & Related papers (2023-05-31T03:36:50Z) - Two Steps Forward and One Behind: Rethinking Time Series Forecasting
with Deep Learning [7.967995669387532]
The Transformer is a highly successful deep learning model that has revolutionised the world of artificial neural networks.
We investigate the effectiveness of Transformer-based models applied to the domain of time series forecasting.
We propose a set of alternative models that are better performing and significantly less complex.
arXiv Detail & Related papers (2023-04-10T12:47:42Z) - How Tempering Fixes Data Augmentation in Bayesian Neural Networks [22.188535244056016]
We show that tempering implicitly reduces the misspecification arising from modeling augmentations as i.i.d. data.
The temperature mimics the role of the effective sample size, reflecting the gain in information provided by the augmentations.
arXiv Detail & Related papers (2022-05-27T11:06:56Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Back2Future: Leveraging Backfill Dynamics for Improving Real-time
Predictions in Future [73.03458424369657]
In real-time forecasting in public health, data collection is a non-trivial and demanding task.
'Backfill' phenomenon and its effect on model performance has been barely studied in the prior literature.
We formulate a novel problem and neural framework Back2Future that aims to refine a given model's predictions in real-time.
arXiv Detail & Related papers (2021-06-08T14:48:20Z) - Self-Regression Learning for Blind Hyperspectral Image Fusion Without
Label [11.291055330647977]
We propose a self-regression learning method that reconstructs hyperspectral image (HSI) and estimate the observation model.
In particular, we adopt an invertible neural network (INN) for restoring the HSI, and two fully-connected networks (FCN) for estimating the observation model.
Our model can outperform the state-of-the-art methods in experiments on both synthetic and real-world dataset.
arXiv Detail & Related papers (2021-03-31T04:48:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.