Related papers: Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization

Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization

URL: http://arxiv.org/abs/2403.00798v1
Date: Fri, 23 Feb 2024 15:00:46 GMT
Title: Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian Eigenvalue Regularization
Authors: Zirui Zhu, Yong Liu, Zangwei Zheng, Huifeng Guo, Yang You
Abstract summary: Click-Through Rate (CTR) prediction holds paramount significance in online advertising and recommendation scenarios. Despite the proliferation of recent CTR prediction models, the improvements in performance have remained limited.
Score: 22.964109377128523
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Click-Through Rate (CTR) prediction holds paramount significance in online advertising and recommendation scenarios. Despite the proliferation of recent CTR prediction models, the improvements in performance have remained limited, as evidenced by open-source benchmark assessments. Current researchers tend to focus on developing new models for various datasets and settings, often neglecting a crucial question: What is the key challenge that truly makes CTR prediction so demanding? In this paper, we approach the problem of CTR prediction from an optimization perspective. We explore the typical data characteristics and optimization statistics of CTR prediction, revealing a strong positive correlation between the top hessian eigenvalue and feature frequency. This correlation implies that frequently occurring features tend to converge towards sharp local minima, ultimately leading to suboptimal performance. Motivated by the recent advancements in sharpness-aware minimization (SAM), which considers the geometric aspects of the loss landscape during optimization, we present a dedicated optimizer crafted for CTR prediction, named Helen. Helen incorporates frequency-wise Hessian eigenvalue regularization, achieved through adaptive perturbations based on normalized feature frequencies. Empirical results under the open-source benchmark framework underscore Helen's effectiveness. It successfully constrains the top eigenvalue of the Hessian matrix and demonstrates a clear advantage over widely used optimization algorithms when applied to seven popular models across three public benchmark datasets on BARS. Our code locates at github.com/NUS-HPC-AI-Lab/Helen.

Related papers

Scaled Supervision is an Implicit Lipschitz Regularizer [32.41225209639384]
In social media, recommender systems rely on the click-through rate (CTR) as the standard metric to evaluate user engagement. We show that scaling supervision bandwidth can act as an implicit Lipschitz regularizer, stably optimizing existing CTR models to achieve better generalizability.
arXiv Detail & Related papers (2025-03-19T01:01:28Z)
An accuracy improving method for advertising click through rate prediction based on enhanced xDeepFM model [0.0]
This paper proposes an improved CTR prediction model based on the xDeepFM architecture. By integrating a multi-head attention mechanism, the model can simultaneously focus on different aspects of feature interactions. Experimental results on the Criteo dataset demonstrate that the proposed model outperforms other state-of-the-art methods.
arXiv Detail & Related papers (2024-11-21T03:21:29Z)
NeSHFS: Neighborhood Search with Heuristic-based Feature Selection for Click-Through Rate Prediction [1.3805049652130312]
Click-through-rate (CTR) prediction plays an important role in online advertising and ad recommender systems. We propose a CTR algorithm named Neighborhood Search with Heuristic-based Feature Selection (NeSHFS) to enhance CTR prediction performance.
arXiv Detail & Related papers (2024-09-13T10:43:18Z)
Investigating the Robustness of Counterfactual Learning to Rank Models: A Reproducibility Study [61.64685376882383]
Counterfactual learning to rank (CLTR) has attracted extensive attention in the IR community for its ability to leverage massive logged user interaction data to train ranking models. This paper investigates the robustness of existing CLTR models in complex and diverse situations. We find that the DLA models and IPS-DCM show better robustness under various simulation settings than IPS-PBM and PRS with offline propensity estimation.
arXiv Detail & Related papers (2024-04-04T10:54:38Z)
Click-Conversion Multi-Task Model with Position Bias Mitigation for Sponsored Search in eCommerce [51.211924408864355]
We propose two position-bias-free prediction models: Position-Aware Click-Conversion (PACC) and PACC via Position Embedding (PACC-PE) Experiments on the E-commerce sponsored product search dataset show that our proposed models have better ranking effectiveness and can greatly alleviate position bias in both CTR and CVR prediction.
arXiv Detail & Related papers (2023-07-29T19:41:16Z)
Consensus-Adaptive RANSAC [104.87576373187426]
We propose a new RANSAC framework that learns to explore the parameter space by considering the residuals seen so far via a novel attention layer. The attention mechanism operates on a batch of point-to-model residuals, and updates a per-point estimation state to take into account the consensus found through a lightweight one-step transformer.
arXiv Detail & Related papers (2023-07-26T08:25:46Z)
Meta-Wrapper: Differentiable Wrapping Operator for User Interest Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems. Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success. We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z)
Rethinking Position Bias Modeling with Knowledge Distillation for CTR Prediction [8.414183573280779]
This work proposes a knowledge distillation framework to alleviate the impact of position bias and leverage position information to improve CTR prediction. The proposed method has been deployed in the real world online ads systems, serving main traffic on one of the world's largest e-commercial platforms.
arXiv Detail & Related papers (2022-04-01T07:58:38Z)
A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video. Recent studies have found that current benchmark datasets may have obvious moment annotation biases. We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z)
Looking at CTR Prediction Again: Is Attention All You Need? [4.873362301533825]
Click-through rate (CTR) prediction is a critical problem in web search, recommendation systems and online advertisement displaying. We use the discrete choice model in economics to redefine the CTR prediction problem, and propose a general neural network framework built on self-attention mechanism. It is found that most existing CTR prediction models align with our proposed general framework.
arXiv Detail & Related papers (2021-05-12T10:27:14Z)
Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift. We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness. The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.