Mitigating Divergence of Latent Factors via Dual Ascent for Low Latency
Event Prediction Models
- URL: http://arxiv.org/abs/2111.07866v1
- Date: Mon, 15 Nov 2021 16:09:48 GMT
- Title: Mitigating Divergence of Latent Factors via Dual Ascent for Low Latency
Event Prediction Models
- Authors: Alex Shtoff, Yair Koren
- Abstract summary: Real-world content recommendation marketplaces exhibit certain behaviors and are imposed by constraints that are not always apparent in common static offline data sets.
We present a systematic method to prevent model parameters from diverging by imposing a carefully chosen set of constraints on the model's latent vectors.
We conduct an online experiment which shows a substantial reduction in the number of diverging instances, and a significant improvement to both user experience and revenue.
- Score: 0.739706777911384
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-world content recommendation marketplaces exhibit certain behaviors and
are imposed by constraints that are not always apparent in common static
offline data sets. One example that is common in ad marketplaces is swift ad
turnover. New ads are introduced and old ads disappear at high rates every day.
Another example is ad discontinuity, where existing ads may appear and
disappear from the market for non negligible amounts of time due to a variety
of reasons (e.g., depletion of budget, pausing by the advertiser, flagging by
the system, and more). These behaviors sometimes cause the model's loss surface
to change dramatically over short periods of time. To address these behaviors,
fresh models are highly important, and to achieve this (and for several other
reasons) incremental training on small chunks of past events is often employed.
These behaviors and algorithmic optimizations occasionally cause model
parameters to grow uncontrollably large, or \emph{diverge}. In this work
present a systematic method to prevent model parameters from diverging by
imposing a carefully chosen set of constraints on the model's latent vectors.
We then devise a method inspired by primal-dual optimization algorithms to
fulfill these constraints in a manner which both aligns well with incremental
model training, and does not require any major modifications to the underlying
model training algorithm.
We analyze, demonstrate, and motivate our method on OFFSET, a collaborative
filtering algorithm which drives Yahoo native advertising, which is one of
VZM's largest and faster growing businesses, reaching a run-rate of many
hundreds of millions USD per year. Finally, we conduct an online experiment
which shows a substantial reduction in the number of diverging instances, and a
significant improvement to both user experience and revenue.
Related papers
- Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition [99.7047087527422]
In this work, we demonstrate that competition can fundamentally alter the behavior of machine learning scaling trends.
We find many settings where improving data representation quality decreases the overall predictive accuracy across users.
At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare.
arXiv Detail & Related papers (2023-06-26T13:06:34Z) - Precision-Recall Divergence Optimization for Generative Modeling with
GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows.
We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences.
Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z) - Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm.
We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift.
We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z) - SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance [19.930169700686672]
This work aims to design a new, low-latency BERT via structured pruning to empower real-time online inference for cold start ads relevance on a CPU platform.
In this paper, we propose SwiftPruner - an efficient framework that leverages evolution-based search to automatically find the best-performing layer-wise sparse BERT model.
arXiv Detail & Related papers (2022-08-30T03:05:56Z) - Forget Less, Count Better: A Domain-Incremental Self-Distillation
Learning Benchmark for Lifelong Crowd Counting [51.44987756859706]
Off-the-shelf methods have some drawbacks to handle multiple domains.
Lifelong Crowd Counting aims at alleviating the catastrophic forgetting and improving the generalization ability.
arXiv Detail & Related papers (2022-05-06T15:37:56Z) - Dynamic Dual-Output Diffusion Models [100.32273175423146]
Iterative denoising-based generation has been shown to be comparable in quality to other classes of generative models.
A major drawback of this method is that it requires hundreds of iterations to produce a competitive result.
Recent works have proposed solutions that allow for faster generation with fewer iterations, but the image quality gradually deteriorates.
arXiv Detail & Related papers (2022-03-08T11:20:40Z) - Challenges and approaches to privacy preserving post-click conversion
prediction [3.4071263815701336]
We provide an overview of the challenges and constraints when learning conversion models in this setting.
We introduce a novel approach for training these models that makes use of post-ranking signals.
We show using offline experiments on real world data that it outperforms a model relying on opt-in data alone.
arXiv Detail & Related papers (2022-01-29T21:36:01Z) - Mitigating Temporal-Drift: A Simple Approach to Keep NER Models Crisp [16.960138447997007]
Performance of neural models for named entity recognition degrades over time, becoming stale.
We propose an intuitive approach to measure the potential trendiness of tweets and use this metric to select the most informative instances to use for training.
Our approach shows larger increases in prediction accuracy with less training data than the alternatives, making it an attractive, practical solution.
arXiv Detail & Related papers (2021-04-20T03:35:25Z) - Generating multi-type sequences of temporal events to improve fraud
detection in game advertising [0.0]
We propose using a variant of Time-LSTM cells in combination with a modified version of Sequence Generative Adversarial Generative (SeqGAN) to generate artificial sequences.
The GAN-generated sequences can be used to enhance the classification ability of event-based fraud detections.
arXiv Detail & Related papers (2021-04-07T23:19:13Z) - Deep Bayesian Bandits: Exploring in Online Personalized Recommendations [4.845576821204241]
We formulate a display advertising recommender as a contextual bandit.
We implement exploration techniques that require sampling from the posterior distribution of click-through-rates.
We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting.
arXiv Detail & Related papers (2020-08-03T08:58:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.