Scaled Supervision is an Implicit Lipschitz Regularizer
- URL: http://arxiv.org/abs/2503.14813v1
- Date: Wed, 19 Mar 2025 01:01:28 GMT
- Title: Scaled Supervision is an Implicit Lipschitz Regularizer
- Authors: Zhongyu Ouyang, Chunhui Zhang, Yaning Jia, Soroush Vosoughi,
- Abstract summary: In social media, recommender systems rely on the click-through rate (CTR) as the standard metric to evaluate user engagement.<n>We show that scaling supervision bandwidth can act as an implicit Lipschitz regularizer, stably optimizing existing CTR models to achieve better generalizability.
- Score: 32.41225209639384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In modern social media, recommender systems (RecSys) rely on the click-through rate (CTR) as the standard metric to evaluate user engagement. CTR prediction is traditionally framed as a binary classification task to predict whether a user will interact with a given item. However, this approach overlooks the complexity of real-world social modeling, where the user, item, and their interactive features change dynamically in fast-paced online environments. This dynamic nature often leads to model instability, reflected in overfitting short-term fluctuations rather than higher-level interactive patterns. While overfitting calls for more scaled and refined supervisions, current solutions often rely on binary labels that overly simplify fine-grained user preferences through the thresholding process, which significantly reduces the richness of the supervision. Therefore, we aim to alleviate the overfitting problem by increasing the supervision bandwidth in CTR training. Specifically, (i) theoretically, we formulate the impact of fine-grained preferences on model stability as a Lipschitz constrain; (ii) empirically, we discover that scaling the supervision bandwidth can act as an implicit Lipschitz regularizer, stably optimizing existing CTR models to achieve better generalizability. Extensive experiments show that this scaled supervision significantly and consistently improves the optimization process and the performance of existing CTR models, even without the need for additional hyperparameter tuning.
Related papers
- Towards Generalizable Trajectory Prediction Using Dual-Level Representation Learning And Adaptive Prompting [107.4034346788744]
Existing vehicle trajectory prediction models struggle with generalizability, prediction uncertainties, and handling complex interactions.<n>We propose Perceiver with Register queries (PerReg+), a novel trajectory prediction framework that introduces: (1) Dual-Level Representation Learning via Self-Distillation (SD) and Masked Reconstruction (MR), capturing global context and fine-grained details; (2) Enhanced Multimodality using register-based queries and pretraining, eliminating the need for clustering and suppression; and (3) Adaptive Prompt Tuning during fine-tuning, freezing the main architecture and optimizing a small number of prompts for efficient adaptation.
arXiv Detail & Related papers (2025-01-08T20:11:09Z) - Scale-Invariant Learning-to-Rank [0.0]
At Expedia, learning-to-rank models play a key role in sorting and presenting information more relevant to users.<n>A major challenge in deploying these models is ensuring consistent feature scaling between training and production data.<n>We introduce a scale-invariant LTR framework which combines a deep and a wide neural network to mathematically guarantee scale-invariance in the model at both training and prediction time.<n>We evaluate our framework in simulated real-world scenarios with injected feature scale issues by perturbing the test set at prediction time, and show that even with inconsistent train-test scaling, using framework achieves better performance than
arXiv Detail & Related papers (2024-10-02T19:05:12Z) - CTR-KAN: KAN for Adaptive High-Order Feature Interaction Modeling [37.80127625183842]
CTR-KAN is an adaptive framework for efficient high-order feature interaction modeling.<n>It builds upon the Kolmogorov-Arnold Network (KAN) paradigm, addressing its limitations in CTR prediction tasks.<n>CTR-KAN achieves state-of-the-art predictive accuracy with significantly lower computational costs.
arXiv Detail & Related papers (2024-08-16T12:51:52Z) - Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Self-Regularization [77.62516752323207]
We introduce an orthogonal fine-tuning method for efficiently fine-tuning pretrained weights and enabling enhanced robustness and generalization.
A self-regularization strategy is further exploited to maintain the stability in terms of zero-shot generalization of VLMs, dubbed OrthSR.
For the first time, we revisit the CLIP and CoOp with our method to effectively improve the model on few-shot image classficiation scenario.
arXiv Detail & Related papers (2024-07-11T10:35:53Z) - Expressive and Generalizable Low-rank Adaptation for Large Models via Slow Cascaded Learning [55.5715496559514]
LoRA Slow Cascade Learning (LoRASC) is an innovative technique designed to enhance LoRA's expressiveness and generalization capabilities.
Our approach augments expressiveness through a cascaded learning strategy that enables a mixture-of-low-rank adaptation, thereby increasing the model's ability to capture complex patterns.
arXiv Detail & Related papers (2024-07-01T17:28:59Z) - Helen: Optimizing CTR Prediction Models with Frequency-wise Hessian
Eigenvalue Regularization [22.964109377128523]
Click-Through Rate (CTR) prediction holds paramount significance in online advertising and recommendation scenarios.
Despite the proliferation of recent CTR prediction models, the improvements in performance have remained limited.
arXiv Detail & Related papers (2024-02-23T15:00:46Z) - Continual Learners are Incremental Model Generalizers [70.34479702177988]
This paper extensively studies the impact of Continual Learning (CL) models as pre-trainers.
We find that the transfer quality of the representation often increases gradually without noticeable degradation in fine-tuning performance.
We propose a new fine-tuning scheme, GLobal Attention Discretization (GLAD), that preserves rich task-generic representation during solving downstream tasks.
arXiv Detail & Related papers (2023-06-21T05:26:28Z) - Towards Long-Term Time-Series Forecasting: Feature, Pattern, and
Distribution [57.71199089609161]
Long-term time-series forecasting (LTTF) has become a pressing demand in many applications, such as wind power supply planning.
Transformer models have been adopted to deliver high prediction capacity because of the high computational self-attention mechanism.
We propose an efficient Transformerbased model, named Conformer, which differentiates itself from existing methods for LTTF in three aspects.
arXiv Detail & Related papers (2023-01-05T13:59:29Z) - Meta-Wrapper: Differentiable Wrapping Operator for User Interest
Selection in CTR Prediction [97.99938802797377]
Click-through rate (CTR) prediction, whose goal is to predict the probability of the user to click on an item, has become increasingly significant in recommender systems.
Recent deep learning models with the ability to automatically extract the user interest from his/her behaviors have achieved great success.
We propose a novel approach under the framework of the wrapper method, which is named Meta-Wrapper.
arXiv Detail & Related papers (2022-06-28T03:28:15Z) - Concept Drift Adaptation for CTR Prediction in Online Advertising
Systems [6.900209851954917]
Click-through rate (CTR) prediction is a crucial task in web search, recommender systems, and online advertisement displaying.
In this paper, we propose adaptive mixture of experts (AdaMoE) to alleviate the concept drift problem by adaptive filtering in the data stream of CTR prediction.
arXiv Detail & Related papers (2022-04-01T07:43:43Z) - Click-through Rate Prediction with Auto-Quantized Contrastive Learning [46.585376453464114]
We consider whether the user behaviors are rich enough to capture the interests for prediction, and propose an Auto-Quantized Contrastive Learning (AQCL) loss to regularize the model.
The proposed framework is agnostic to different model architectures and can be trained in an end-to-end fashion.
arXiv Detail & Related papers (2021-09-27T04:39:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.