Related papers: When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution

When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution

URL: http://arxiv.org/abs/2104.07150v1
Date: Wed, 14 Apr 2021 22:15:58 GMT
Title: When and Whom to Collaborate with in a Changing Environment: A Collaborative Dynamic Bandit Solution
Authors: Chuanhao Li, Qingyun Wu, Hongning Wang
Abstract summary: Collaborative bandit algorithms utilize collaborative filtering techniques to improve sample efficiency in online interactive recommendation. All existing collaborative bandit learning solutions impose a stationary assumption about the environment. We develop a collaborative dynamic bandit solution to handle changing environment for recommendation.
Score: 36.76450390135742
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Collaborative bandit learning, i.e., bandit algorithms that utilize collaborative filtering techniques to improve sample efficiency in online interactive recommendation, has attracted much research attention as it enjoys the best of both worlds. However, all existing collaborative bandit learning solutions impose a stationary assumption about the environment, i.e., both user preferences and the dependency among users are assumed static over time. Unfortunately, this assumption hardly holds in practice due to users' ever-changing interests and dependence relations, which inevitably costs a recommender system sub-optimal performance in practice. In this work, we develop a collaborative dynamic bandit solution to handle a changing environment for recommendation. We explicitly model the underlying changes in both user preferences and their dependency relation as a stochastic process. Individual user's preference is modeled by a mixture of globally shared contextual bandit models with a Dirichlet Process prior. Collaboration among users is thus achieved via Bayesian inference over the global bandit models. Model selection and arm selection for each user are done via Thompson sampling to balance exploitation and exploration. Our solution is proved to maintain a standard $\tilde O(\sqrt{T})$ sublinear regret even in such a challenging environment. And extensive empirical evaluations on both synthetic and real-world datasets further confirmed the necessity of modeling a changing environment and our algorithm's practical advantages against several state-of-the-art online learning solutions.

Related papers

Online Clustering of Dueling Bandits [59.09590979404303]
We introduce the first "clustering of dueling bandit algorithms" to enable collaborative decision-making based on preference feedback. We propose two novel algorithms: (1) Clustering of Linear Dueling Bandits (COLDB) which models the user reward functions as linear functions of the context vectors, and (2) Clustering of Neural Dueling Bandits (CONDB) which uses a neural network to model complex, non-linear user reward functions.
arXiv Detail & Related papers (2025-02-04T07:55:41Z)
Efficient and Robust Regularized Federated Recommendation [52.24782464815489]
The recommender system (RSRS) addresses both user preference and privacy concerns. We propose a novel method that incorporates non-uniform gradient descent to improve communication efficiency. RFRecF's superior robustness compared to diverse baselines.
arXiv Detail & Related papers (2024-11-03T12:10:20Z)
Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm. Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z)
Interactive Graph Convolutional Filtering [79.34979767405979]
Interactive Recommender Systems (IRS) have been increasingly used in various domains, including personalized article recommendation, social media, and online advertising. These problems are exacerbated by the cold start problem and data sparsity problem. Existing Multi-Armed Bandit methods, despite their carefully designed exploration strategies, often struggle to provide satisfactory results in the early stages. Our proposed method extends interactive collaborative filtering into the graph model to enhance the performance of collaborative filtering between users and items.
arXiv Detail & Related papers (2023-09-04T09:02:31Z)
An Empirical Evaluation of Federated Contextual Bandit Algorithms [27.275089644378376]
Federated learning can be done using implicit signals generated as users interact with applications of interest. We develop variants of prominent contextual bandit algorithms from the centralized seting for the federated setting. Our experiments reveal the surprising effectiveness of the simple and commonly used softmax in balancing the well-know exploration-exploitation tradeoff.
arXiv Detail & Related papers (2023-03-17T19:22:30Z)
Federated Online Sparse Decision Making [24.856596181768364]
textttFedego Lasso relies on a novel multi-client-selfish bandit policy design. Experiments demonstrate the effectiveness of the proposed algorithms on both synthetic and real-world datasets.
arXiv Detail & Related papers (2022-02-27T20:34:41Z)
Scalable Bayesian Inverse Reinforcement Learning [93.27920030279586]
We introduce Approximate Variational Reward Imitation Learning (AVRIL) Our method addresses the ill-posed nature of the inverse reinforcement learning problem. Applying our method to real medical data alongside classic control simulations, we demonstrate Bayesian reward inference in environments beyond the scope of current methods.
arXiv Detail & Related papers (2021-02-12T12:32:02Z)
Learning User Preferences in Non-Stationary Environments [42.785926822853746]
We introduce a novel model for online non-stationary recommendation systems. We show that our algorithm outperforms other static algorithms even when preferences do not change over time.
arXiv Detail & Related papers (2021-01-29T10:26:16Z)
Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z)
Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach [36.54379845220444]
We study settings in which content providers cannot remain viable unless they receive a certain level of user engagement. Our model ensures the system reaches an equilibrium with maximal social welfare supported by a sufficiently diverse set of viable providers. We draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.
arXiv Detail & Related papers (2020-07-31T22:40:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.