Related papers: Mitigating Divergence of Latent Factors via Dual Ascent for Low Latency Event Prediction Models

Mitigating Divergence of Latent Factors via Dual Ascent for Low Latency Event Prediction Models

URL: http://arxiv.org/abs/2111.07866v1
Date: Mon, 15 Nov 2021 16:09:48 GMT
Title: Mitigating Divergence of Latent Factors via Dual Ascent for Low Latency Event Prediction Models
Authors: Alex Shtoff, Yair Koren
Abstract summary: Real-world content recommendation marketplaces exhibit certain behaviors and are imposed by constraints that are not always apparent in common static offline data sets. We present a systematic method to prevent model parameters from diverging by imposing a carefully chosen set of constraints on the model's latent vectors. We conduct an online experiment which shows a substantial reduction in the number of diverging instances, and a significant improvement to both user experience and revenue.
Score: 0.739706777911384
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Real-world content recommendation marketplaces exhibit certain behaviors and are imposed by constraints that are not always apparent in common static offline data sets. One example that is common in ad marketplaces is swift ad turnover. New ads are introduced and old ads disappear at high rates every day. Another example is ad discontinuity, where existing ads may appear and disappear from the market for non negligible amounts of time due to a variety of reasons (e.g., depletion of budget, pausing by the advertiser, flagging by the system, and more). These behaviors sometimes cause the model's loss surface to change dramatically over short periods of time. To address these behaviors, fresh models are highly important, and to achieve this (and for several other reasons) incremental training on small chunks of past events is often employed. These behaviors and algorithmic optimizations occasionally cause model parameters to grow uncontrollably large, or \emph{diverge}. In this work present a systematic method to prevent model parameters from diverging by imposing a carefully chosen set of constraints on the model's latent vectors. We then devise a method inspired by primal-dual optimization algorithms to fulfill these constraints in a manner which both aligns well with incremental model training, and does not require any major modifications to the underlying model training algorithm. We analyze, demonstrate, and motivate our method on OFFSET, a collaborative filtering algorithm which drives Yahoo native advertising, which is one of VZM's largest and faster growing businesses, reaching a run-rate of many hundreds of millions USD per year. Finally, we conduct an online experiment which shows a substantial reduction in the number of diverging instances, and a significant improvement to both user experience and revenue.

Related papers

Climber-Pilot: A Non-Myopic Generative Recommendation Model Towards Better Instruction-Following [19.550149895505683]
We present Climber-Pilot, a unified generative retrieval framework.<n>We introduce Time-Aware Multi-Item Prediction (TAMIP), a novel training paradigm designed to mitigate inherent myopia in generative retrieval.<n>We also propose Condition-Guided Sparse Attention (CGSA), which incorporates business constraints directly into the generative process via sparse attention.
arXiv Detail & Related papers (2026-02-14T03:46:06Z)
Practice on Long Behavior Sequence Modeling in Tencent Advertising [75.65309022911994]
Long-sequence modeling has become an indispensable frontier in recommendation systems for capturing users' long-term preferences.<n>We propose several practical approaches within the two-stage framework for long-sequence modeling.<n> Deployed in production on Tencent's large-scale advertising platforms, our innovations delivered significant performance gains.
arXiv Detail & Related papers (2025-09-10T06:55:57Z)
Intention-Conditioned Flow Occupancy Models [69.79049994662591]
Large-scale pre-training has fundamentally changed how machine learning research is done today.<n>Applying this same framework to reinforcement learning is appealing because it offers compelling avenues for addressing core challenges in RL.<n>Recent advances in generative AI have provided new tools for modeling highly complex distributions.
arXiv Detail & Related papers (2025-06-10T15:27:46Z)
FuXi-$α$: Scaling Recommendation Model with Feature Interaction Enhanced Transformer [81.12174905444229]
Recent advancements have shown that expanding sequential recommendation models to large-scale recommendation models can be an effective strategy. We propose a new model called FuXi-$alpha$ to address these issues. Our model outperforms existing models, with its performance continuously improving as the model size increases.
arXiv Detail & Related papers (2025-02-05T09:46:54Z)
Optimizing Online Advertising with Multi-Armed Bandits: Mitigating the Cold Start Problem under Auction Dynamics [0.1759252234439348]
Insufficient behavioral data (clicks) makes accurate click-through rate forecasting of new ads challenging. We develop a UCB-like algorithm under multi-armed bandit (MAB) setting for positional-based model. In addition to increasing the platform's long-term profitability, we also propose a mechanism for maintaining short-term profits.
arXiv Detail & Related papers (2025-02-03T22:33:24Z)
Towards Scalable and Deep Graph Neural Networks via Noise Masking [59.058558158296265]
Graph Neural Networks (GNNs) have achieved remarkable success in many graph mining tasks. scaling them to large graphs is challenging due to the high computational and storage costs. We present random walk with noise masking (RMask), a plug-and-play module compatible with the existing model-simplification works.
arXiv Detail & Related papers (2024-12-19T07:48:14Z)
MDiFF: Exploiting Multimodal Score-based Diffusion Models for New Fashion Product Performance Forecasting [9.100853455059111]
We propose MDiFF: a novel two-step multimodal diffusion models-based pipeline for New Fashion Product Performance Forecasting (NFPPF) First, we use a score-based diffusion model to predict multiple future sales for different clothes over time. Then, we refine these multiple predictions with a lightweight Multi-layer Perceptron (MLP) to get the final forecast.
arXiv Detail & Related papers (2024-12-07T07:15:59Z)
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI) In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion) Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z)
Improved Bayes Risk Can Yield Reduced Social Welfare Under Competition [99.7047087527422]
In this work, we demonstrate that competition can fundamentally alter the behavior of machine learning scaling trends. We find many settings where improving data representation quality decreases the overall predictive accuracy across users. At a conceptual level, our work suggests that favorable scaling trends for individual model-providers need not translate to downstream improvements in social welfare.
arXiv Detail & Related papers (2023-06-26T13:06:34Z)
Precision-Recall Divergence Optimization for Generative Modeling with GANs and Normalizing Flows [54.050498411883495]
We develop a novel training method for generative models, such as Generative Adversarial Networks and Normalizing Flows. We show that achieving a specified precision-recall trade-off corresponds to minimizing a unique $f$-divergence from a family we call the textitPR-divergences. Our approach improves the performance of existing state-of-the-art models like BigGAN in terms of either precision or recall when tested on datasets such as ImageNet.
arXiv Detail & Related papers (2023-05-30T10:07:17Z)
Non-Invasive Fairness in Learning through the Lens of Data Drift [88.37640805363317]
We show how to improve the fairness of Machine Learning models without altering the data or the learning algorithm. We use a simple but key insight: the divergence of trends between different populations, and, consecutively, between a learned model and minority populations, is analogous to data drift. We explore two strategies (model-splitting and reweighing) to resolve this drift, aiming to improve the overall conformance of models to the underlying data.
arXiv Detail & Related papers (2023-03-30T17:30:42Z)
SwiftPruner: Reinforced Evolutionary Pruning for Efficient Ad Relevance [19.930169700686672]
This work aims to design a new, low-latency BERT via structured pruning to empower real-time online inference for cold start ads relevance on a CPU platform. In this paper, we propose SwiftPruner - an efficient framework that leverages evolution-based search to automatically find the best-performing layer-wise sparse BERT model.
arXiv Detail & Related papers (2022-08-30T03:05:56Z)
Forget Less, Count Better: A Domain-Incremental Self-Distillation Learning Benchmark for Lifelong Crowd Counting [51.44987756859706]
Off-the-shelf methods have some drawbacks to handle multiple domains. Lifelong Crowd Counting aims at alleviating the catastrophic forgetting and improving the generalization ability.
arXiv Detail & Related papers (2022-05-06T15:37:56Z)
Dynamic Dual-Output Diffusion Models [100.32273175423146]
Iterative denoising-based generation has been shown to be comparable in quality to other classes of generative models. A major drawback of this method is that it requires hundreds of iterations to produce a competitive result. Recent works have proposed solutions that allow for faster generation with fewer iterations, but the image quality gradually deteriorates.
arXiv Detail & Related papers (2022-03-08T11:20:40Z)
Challenges and approaches to privacy preserving post-click conversion prediction [3.4071263815701336]
We provide an overview of the challenges and constraints when learning conversion models in this setting. We introduce a novel approach for training these models that makes use of post-ranking signals. We show using offline experiments on real world data that it outperforms a model relying on opt-in data alone.
arXiv Detail & Related papers (2022-01-29T21:36:01Z)
Mitigating Temporal-Drift: A Simple Approach to Keep NER Models Crisp [16.960138447997007]
Performance of neural models for named entity recognition degrades over time, becoming stale. We propose an intuitive approach to measure the potential trendiness of tweets and use this metric to select the most informative instances to use for training. Our approach shows larger increases in prediction accuracy with less training data than the alternatives, making it an attractive, practical solution.
arXiv Detail & Related papers (2021-04-20T03:35:25Z)
Generating multi-type sequences of temporal events to improve fraud detection in game advertising [0.0]
We propose using a variant of Time-LSTM cells in combination with a modified version of Sequence Generative Adversarial Generative (SeqGAN) to generate artificial sequences. The GAN-generated sequences can be used to enhance the classification ability of event-based fraud detections.
arXiv Detail & Related papers (2021-04-07T23:19:13Z)
Deep Bayesian Bandits: Exploring in Online Personalized Recommendations [4.845576821204241]
We formulate a display advertising recommender as a contextual bandit. We implement exploration techniques that require sampling from the posterior distribution of click-through-rates. We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting.
arXiv Detail & Related papers (2020-08-03T08:58:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.