Related papers: Online Matching: A Real-time Bandit System for Large-scale Recommendations

Online Matching: A Real-time Bandit System for Large-scale Recommendations

URL: http://arxiv.org/abs/2307.15893v1
Date: Sat, 29 Jul 2023 05:46:27 GMT
Title: Online Matching: A Real-time Bandit System for Large-scale Recommendations
Authors: Xinyang Yi, Shao-Chuan Wang, Ruining He, Hariharan Chandrasekaran, Charles Wu, Lukasz Heldt, Lichan Hong, Minmin Chen, Ed H. Chi
Abstract summary: Online Matching is a scalable closed-loop bandit system learning from users' direct feedback on items in real time. Diag-LinUCB is a novel extension of the LinUCB algorithm to enable distributed updates of bandits parameter in a scalable and timely manner.
Score: 23.954049092470548
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The last decade has witnessed many successes of deep learning-based models for industry-scale recommender systems. These models are typically trained offline in a batch manner. While being effective in capturing users' past interactions with recommendation platforms, batch learning suffers from long model-update latency and is vulnerable to system biases, making it hard to adapt to distribution shift and explore new items or user interests. Although online learning-based approaches (e.g., multi-armed bandits) have demonstrated promising theoretical results in tackling these challenges, their practical real-time implementation in large-scale recommender systems remains limited. First, the scalability of online approaches in servicing a massive online traffic while ensuring timely updates of bandit parameters poses a significant challenge. Additionally, exploring uncertainty in recommender systems can easily result in unfavorable user experience, highlighting the need for devising intricate strategies that effectively balance the trade-off between exploitation and exploration. In this paper, we introduce Online Matching: a scalable closed-loop bandit system learning from users' direct feedback on items in real time. We present a hybrid "offline + online" approach for constructing this system, accompanied by a comprehensive exposition of the end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of the LinUCB algorithm -- to enable distributed updates of bandits parameter in a scalable and timely manner. We conduct live experiments in YouTube and show that Online Matching is able to enhance the capabilities of fresh content discovery and item exploration in the present platform.

Related papers

A Closer Look at System Prompt Robustness [2.5525497052179995]
Developers depend on system prompts to specify important context, output format, personalities, guardrails, content policies, and safety countermeasures. In practice, models often forget to consider relevant guardrails or fail to resolve conflicting demands between the system and the user. We create realistic new evaluation and fine-tuning datasets based on prompts collected from OpenAI's GPT Store and HuggingFace's HuggingChat.
arXiv Detail & Related papers (2025-02-15T18:10:45Z)
Online-BLS: An Accurate and Efficient Online Broad Learning System for Data Stream Classification [52.251569042852815]
We introduce an online broad learning system framework with closed-form solutions for each online update. We design an effective weight estimation algorithm and an efficient online updating strategy. Our framework is naturally extended to data stream scenarios with concept drift and exceeds state-of-the-art baselines.
arXiv Detail & Related papers (2025-01-28T13:21:59Z)
Epinet for Content Cold Start [14.018820788546535]
epinets enables efficient approximations of Thompson sampling even when the learning model is a complex neural network. Our experiments demonstrate improvements in both user traffic and engagement efficiency on the Facebook Reels online video platform.
arXiv Detail & Related papers (2024-11-20T19:43:27Z)
BayesCNS: A Unified Bayesian Approach to Address Cold Start and Non-Stationarity in Search Systems at Scale [1.1634177851893535]
BayesCNS is designed to handle cold start and non-stationary distribution shifts in search systems at scale. BayesCNS achieves this by estimating prior distributions for user-item interactions, which are continuously updated with new user interactions gathered online. This online learning procedure is guided by a ranker model, enabling efficient exploration of relevant items using contextual information.
arXiv Detail & Related papers (2024-10-03T01:14:30Z)
Interactive Graph Convolutional Filtering [79.34979767405979]
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising. These problems are exacerbated by the cold start problem and data sparsity problem. Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages. Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items.
arXiv Detail & Related papers (2023-09-04T09:02:31Z)
On the Opportunities and Challenges of Offline Reinforcement Learning for Recommender Systems [36.608400817940236]
Reinforcement learning serves as potent tool for modeling dynamic user interests within recommender systems. Recent strides in offline reinforcement learning present a new perspective. Despite being a burgeoning field, works centered on recommender systems utilizing offline reinforcement learning remain limited.
arXiv Detail & Related papers (2023-08-22T10:28:02Z)
Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online. We extensively ablate these design choices, demonstrating the key factors that most affect performance. We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z)
Augmented Bilinear Network for Incremental Multi-Stock Time-Series Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities. In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed. This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z)
Scalable and Robust Self-Learning for Skill Routing in Large-Scale Conversational AI Systems [13.705147776518421]
State-of-the-art systems use a model-based approach to enable natural conversations. We propose a scalable self-learning approach to explore routing alternatives.
arXiv Detail & Related papers (2022-04-14T17:46:14Z)
Offline Reinforcement Learning for Mobile Notifications [1.965345368500676]
Mobile notification systems have taken a major role in driving and maintaining user engagement for online platforms. Most machine learning applications in notification systems are built around response-prediction models. We argue that reinforcement learning is a better framework for notification systems in terms of performance and iteration speed.
arXiv Detail & Related papers (2022-02-04T22:22:22Z)
Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking [58.14267480293575]
We propose a simple yet effective online learning approach for few-shot online adaptation without requiring offline training. It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before. We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP.
arXiv Detail & Related papers (2021-12-28T06:51:18Z)
Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z)
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience. Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.