Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian
Eigenvalue Regularization
- URL: http://arxiv.org/abs/2403.00798v1
- Date: Fri, 23 Feb 2024 15:00:46 GMT
- Title: Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian
Eigenvalue Regularization
- Authors: Zirui Zhu, Yong Liu, Zangwei Zheng, Huifeng Guo, Yang You
- Abstract summary: Click-Through Rate (CTR) prediction holds paramount significance in online advertising and recommendation scenarios.
Despite the proliferation of recent CTR prediction models, the improvements in performance have remained limited.
- Score: 22.964109377128523
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Click-Through Rate (CTR) prediction holds paramount significance in online
advertising and recommendation scenarios. Despite the proliferation of recent
CTR prediction models, the improvements in performance have remained limited,
as evidenced by open-source benchmark assessments. Current researchers tend to
focus on developing new models for various datasets and settings, often
neglecting a crucial question: What is the key challenge that truly makes CTR
prediction so demanding?
In this paper, we approach the problem of CTR prediction from an optimization
perspective. We explore the typical data characteristics and optimization
statistics of CTR prediction, revealing a strong positive correlation between
the top hessian eigenvalue and feature frequency. This correlation implies that
frequently occurring features tend to converge towards sharp local minima,
ultimately leading to suboptimal performance. Motivated by the recent
advancements in sharpness-aware minimization (SAM), which considers the
geometric aspects of the loss landscape during optimization, we present a
dedicated optimizer crafted for CTR prediction, named Helen. Helen incorporates
frequency-wise Hessian eigenvalue regularization, achieved through adaptive
perturbations based on normalized feature frequencies.
Empirical results under the open-source benchmark framework underscore
Helen's effectiveness. It successfully constrains the top eigenvalue of the
Hessian matrix and demonstrates a clear advantage over widely used optimization
algorithms when applied to seven popular models across three public benchmark
datasets on BARS. Our code locates at github.com/NUS-HPC-AI-Lab/Helen.
Related papers
- NeSHFS: Neighborhood Search with Heuristic-based Feature Selection for Click-Through Rate Prediction [1.3805049652130312]
Click-through-rate (CTR) prediction plays an important role in online advertising and ad recommender systems.
We propose a CTR algorithm named Neighborhood Search with Heuristic-based Feature Selection (NeSHFS) to enhance CTR prediction performance.
arXiv Detail & Related papers (2024-09-13T10:43:18Z) - Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models.
This paper investigates the robustness of existing CLTR models in complex and diverse situations.
We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z) - Click-Conversion Multi-Task Model with Position Bias Mitigation for
Sponsored Search in eCommerce [51.211924408864355]
We propose two position-bias-free prediction models: Position-Aware Click-Conversion (PACC) and PACC via Position Embedding (PACC-PE)
Experiments on the E-commerce sponsored product search dataset show that our proposed models have better ranking effectiveness and can greatly alleviate position bias in both CTR and CVR prediction.
arXiv Detail & Related papers (2023-07-29T19:41:16Z) - Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer.
The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z) - Joint Optimization of Ranking and Calibration with Contextualized Hybrid
Model [24.66016187602343]
We propose an approach that can Jointly optimize the Ranking and abilities (JRC) for short.
JRC improves the ranking ability by contrasting the logit value for the sample with different labels and constrains the predicted probability to be a function of the logit subtraction.
JRC has been deployed on the display advertising platform of Alibaba and has obtained significant performance improvements.
arXiv Detail & Related papers (2022-08-12T08:32:13Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Rethinking Position Bias Modeling with Knowledge Distillation for CTR
Prediction [8.414183573280779]
This work proposes a knowledge distillation framework to alleviate the impact of position bias and leverage position information to improve CTR prediction.
The proposed method has been deployed in the real world online ads systems, serving main traffic on one of the world's largest e-commercial platforms.
arXiv Detail & Related papers (2022-04-01T07:58:38Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Looking at CTR Prediction Again: Is Attention All You Need? [4.873362301533825]
Click-through rate (CTR) prediction is a critical problem in web search, recommendation systems and online advertisement displaying.
We use the discrete choice model in economics to redefine the CTR prediction problem, and propose a general neural network framework built on self-attention mechanism.
It is found that most existing CTR prediction models align with our proposed general framework.
arXiv Detail & Related papers (2021-05-12T10:27:14Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.