Online Matching: A Real-time Bandit System for Large-scale
Recommendations
- URL: http://arxiv.org/abs/2307.15893v1
- Date: Sat, 29 Jul 2023 05:46:27 GMT
- Title: Online Matching: A Real-time Bandit System for Large-scale
Recommendations
- Authors: Xinyang Yi, Shao-Chuan Wang, Ruining He, Hariharan Chandrasekaran,
Charles Wu, Lukasz Heldt, Lichan Hong, Minmin Chen, Ed H. Chi
- Abstract summary: Online Matching is a scalable closed-loop bandit system learning from users' direct feedback on items in real time.
Diag-LinUCB is a novel extension of the LinUCB algorithm to enable distributed updates of bandits parameter in a scalable and timely manner.
- Score: 23.954049092470548
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The last decade has witnessed many successes of deep learning-based models
for industry-scale recommender systems. These models are typically trained
offline in a batch manner. While being effective in capturing users' past
interactions with recommendation platforms, batch learning suffers from long
model-update latency and is vulnerable to system biases, making it hard to
adapt to distribution shift and explore new items or user interests. Although
online learning-based approaches (e.g., multi-armed bandits) have demonstrated
promising theoretical results in tackling these challenges, their practical
real-time implementation in large-scale recommender systems remains limited.
First, the scalability of online approaches in servicing a massive online
traffic while ensuring timely updates of bandit parameters poses a significant
challenge. Additionally, exploring uncertainty in recommender systems can
easily result in unfavorable user experience, highlighting the need for
devising intricate strategies that effectively balance the trade-off between
exploitation and exploration. In this paper, we introduce Online Matching: a
scalable closed-loop bandit system learning from users' direct feedback on
items in real time. We present a hybrid "offline + online" approach for
constructing this system, accompanied by a comprehensive exposition of the
end-to-end system architecture. We propose Diag-LinUCB -- a novel extension of
the LinUCB algorithm -- to enable distributed updates of bandits parameter in a
scalable and timely manner. We conduct live experiments in YouTube and show
that Online Matching is able to enhance the capabilities of fresh content
discovery and item exploration in the present platform.
Related papers
- A Closer Look at System Prompt Robustness [2.5525497052179995]
Developers depend on system prompts to specify important context, output format, personalities, guardrails, content policies, and safety countermeasures.
In practice, models often forget to consider relevant guardrails or fail to resolve conflicting demands between the system and the user.
We create realistic new evaluation and fine-tuning datasets based on prompts collected from OpenAI's GPT Store and HuggingFace's HuggingChat.
arXiv Detail & Related papers (2025-02-15T18:10:45Z) - Online-BLS: An Accurate and Efficient Online Broad Learning System for Data Stream Classification [52.251569042852815]
We introduce an online broad learning system framework with closed-form solutions for each online update.
We design an effective weight estimation algorithm and an efficient online updating strategy.
Our framework is naturally extended to data stream scenarios with concept drift and exceeds state-of-the-art baselines.
arXiv Detail & Related papers (2025-01-28T13:21:59Z) - Epinet for Content Cold Start [14.018820788546535]
epinets enables efficient approximations of Thompson sampling even when the learning model is a complex neural network.
Our experiments demonstrate improvements in both user traffic and engagement efficiency on the Facebook Reels online video platform.
arXiv Detail & Related papers (2024-11-20T19:43:27Z) - BayesCNS: A Unified Bayesian Approach to Address Cold Start and Non-Stationarity in Search Systems at Scale [1.1634177851893535]
BayesCNS is designed to handle cold start and non-stationary distribution shifts in search systems at scale.
BayesCNS achieves this by estimating prior distributions for user-item interactions, which are continuously updated with new user interactions gathered online.
This online learning procedure is guided by a ranker model, enabling efficient exploration of relevant items using contextual information.
arXiv Detail & Related papers (2024-10-03T01:14:30Z) - Interactive Graph Convolutional Filtering [79.34979767405979]
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising.
These problems are exacerbated by the cold start problem and data sparsity problem.
Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages.
Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items.
arXiv Detail & Related papers (2023-09-04T09:02:31Z) - On the Opportunities and Challenges of Offline Reinforcement Learning
for Recommender Systems [36.608400817940236]
Reinforcement learning serves as potent tool for modeling dynamic user interests within recommender systems.
Recent strides in offline reinforcement learning present a new perspective.
Despite being a burgeoning field, works centered on recommender systems utilizing offline reinforcement learning remain limited.
arXiv Detail & Related papers (2023-08-22T10:28:02Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Augmented Bilinear Network for Incremental Multi-Stock Time-Series
Classification [83.23129279407271]
We propose a method to efficiently retain the knowledge available in a neural network pre-trained on a set of securities.
In our method, the prior knowledge encoded in a pre-trained neural network is maintained by keeping existing connections fixed.
This knowledge is adjusted for the new securities by a set of augmented connections, which are optimized using the new data.
arXiv Detail & Related papers (2022-07-23T18:54:10Z) - Recursive Least-Squares Estimator-Aided Online Learning for Visual
Tracking [58.14267480293575]
We propose a simple yet effective online learning approach for few-shot online adaptation without requiring offline training.
It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before.
We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP.
arXiv Detail & Related papers (2021-12-28T06:51:18Z) - Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users.
The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online.
We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.