Online Matching: A Real-time Bandit System for Large-scale
Recommendations
- URL: http://arxiv.org/abs/2307.15893v1
- Date: Sat, 29 Jul 2023 05:46:27 GMT
- Title: Online Matching: A Real-time Bandit System for Large-scale
Recommendations
- Authors: Xinyang Yi, Shao-Chuan Wang, Ruining He, Hariharan Chandrasekaran,
Charles Wu, Lukasz Heldt, Lichan Hong, Minmin Chen, Ed H. Chi
- Abstract summary: Online Matching is a scalable closed-loop bandit system learning from users' direct feedback on items in real time.
Diag-LinUCB is a novel extension of the LinUCB algorithm to enable distributed updates of bandits parameter in a scalable and timely manner.
- Score: 23.954049092470548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The last decade has witnessed many successes of deep learning-based models
for industry-scale recommender systems. These models are typically trained
offline in a batch manner. While being effective in capturing users' past
interactions with recommendation platforms, batch learning suffers from long
model-update latency and is vulnerable to system biases, making it hard to
adapt to distribution shift and explore new items or user interests. Although
online learning-based approaches (e.g., multi-armed bandits) have demonstrated
promising theoretical results in tackling these challenges, their practical
real-time implementation in large-scale recommender systems remains limited.
First, the scalability of online approaches in servicing a massive online
traffic while ensuring timely updates of bandit parameters poses a significant
challenge. Additionally, exploring uncertainty in recommender systems can
easily result in unfavorable user experience, highlighting the need for
devising intricate strategies that effectively balance the trade-off between
exploitation and exploration. In this paper, we introduce Online Matching: a
scalable closed-loop bandit system learning from users' direct feedback on
items in real time. We present a hybrid "offline + online" approach for
constructing this system, accompanied by a comprehensive exposition of the
end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of
the LinUCB algorithm -- to enable distributed updates of bandits parameter in a
scalable and timely manner. We conduct live experiments in YouTube and show
that Online Matching is able to enhance the capabilities of fresh content
discovery and item exploration in the present platform.
Related papers
- BayesCNS: A Unified Bayesian Approach to Address Cold Start and Non-Stationarity in Search Systems at Scale [1.1634177851893535]
BayesCNS is designed to handle cold start and non-stationary distribution shifts in search systems at scale.
BayesCNS achieves this by estimating prior distributions for user-item interactions, which are continuously updated with new user interactions gathered online.
This online learning procedure is guided by a ranker model, enabling efficient exploration of relevant items using contextual information.
arXiv Detail & Related papers (2024-10-03T01:14:30Z) - Interactive Graph Convolutional Filtering [79.34979767405979]
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising.
These problems are exacerbated by the cold start problem and data sparsity problem.
Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages.
Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items.
arXiv Detail & Related papers (2023-09-04T09:02:31Z) - On the Opportunities and Challenges of Offline Reinforcement Learning
for Recommender Systems [36.608400817940236]
Reinforcement learning serves as potent tool for modeling dynamic user interests within recommender systems.
Recent strides in offline reinforcement learning present a new perspective.
Despite being a burgeoning field, works centered on recommender systems utilizing offline reinforcement learning remain limited.
arXiv Detail & Related papers (2023-08-22T10:28:02Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Augmented Bilinear Network for Incremental Multi-Stock Time-Series
Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities.
In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed.
This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z) - Scalable and Robust Self-Learning for Skill Routing in Large-Scale
Conversational AI Systems [13.705147776518421]
State-of-the-art systems use a model-based approach to enable natural conversations.
We propose a scalable self-learning approach to explore routing alternatives.
arXiv Detail & Related papers (2022-04-14T17:46:14Z) - Offline Reinforcement Learning for Mobile Notifications [1.965345368500676]
Mobile notification systems have taken a major role in driving and maintaining user engagement for online platforms.
Most machine learning applications in notification systems are built around response-prediction models.
We argue that reinforcement learning is a better framework for notification systems in terms of performance and iteration speed.
arXiv Detail & Related papers (2022-02-04T22:22:22Z) - Recursive Least-Squares Estimator-Aided Online Learning for Visual
Tracking [58.14267480293575]
We propose a simple yet effective online learning approach for few-shot online adaptation without requiring offline training.
It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before.
We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP.
arXiv Detail & Related papers (2021-12-28T06:51:18Z) - Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users.
The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online.
We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.