Related papers: Bayesian Non-stationary Linear Bandits for Large-Scale Recommender Systems

Bayesian Non-stationary Linear Bandits for Large-Scale Recommender Systems

URL: http://arxiv.org/abs/2202.03167v2
Date: Mon, 24 Jul 2023 22:56:21 GMT
Title: Bayesian Non-stationary Linear Bandits for Large-Scale Recommender Systems
Authors: Saeed Ghoorchian, Evgenii Kortukov, Setareh Maghsudi
Abstract summary: We build upon the linear contextual multi-armed bandit framework to address this problem. We develop a decision-making policy for a linear bandit problem with high-dimensional feature vectors. Our proposed recommender system employs this policy to learn the users' item preferences online while minimizing runtime.
Score: 6.009759445555003
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Taking advantage of contextual information can potentially boost the performance of recommender systems. In the era of big data, such side information often has several dimensions. Thus, developing decision-making algorithms to cope with such a high-dimensional context in real time is essential. That is specifically challenging when the decision-maker has a variety of items to recommend. In addition, changes in items' popularity or users' preferences can hinder the performance of the deployed recommender system due to a lack of robustness to distribution shifts in the environment. In this paper, we build upon the linear contextual multi-armed bandit framework to address this problem. We develop a decision-making policy for a linear bandit problem with high-dimensional feature vectors, a large set of arms, and non-stationary reward-generating processes. Our Thompson sampling-based policy reduces the dimension of feature vectors using random projection and uses exponentially increasing weights to decrease the influence of past observations with time. Our proposed recommender system employs this policy to learn the users' item preferences online while minimizing runtime. We prove a regret bound that scales as a factor of the reduced dimension instead of the original one. To evaluate our proposed recommender system numerically, we apply it to three real-world datasets. The theoretical and numerical results demonstrate the effectiveness of our proposed algorithm in making a trade-off between computational complexity and regret performance compared to the state-of-the-art.

Related papers

Exploiting Concavity Information in Gaussian Process Contextual Bandit Optimization [2.1046873879077794]
The contextual bandit framework is widely used to solve sequential optimization problems. We consider settings in which the mean reward is known to be a concave function of the action for each fixed context. We propose a contextual bandit algorithm that accelerates optimization by conditioning the posterior of a Bayesian Gaussian Process model on this concavity information.
arXiv Detail & Related papers (2025-03-13T19:35:54Z)
A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining [7.523158123940574]
Recommendation systems provide users with content that meets their needs. Traditional click-through rate prediction and TOP-K recommendation mechanisms are unable to meet the recommendations needs. This paper proposes a recommendations system model based on a separation embedding cross-network.
arXiv Detail & Related papers (2024-10-19T07:49:21Z)
Neural Dueling Bandits: Preference-Based Optimization with Human Feedback [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We also extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Scalable Dynamic Embedding Size Search for Streaming Recommendation [54.28404337601801]
Real-world recommender systems often operate in streaming recommendation scenarios. Number of users and items continues to grow, leading to substantial storage resource consumption. We learn Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items.
arXiv Detail & Related papers (2024-07-22T06:37:24Z)
Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System [39.78277554870799]
We show that setting an identical and static embedding size is sub-optimal in terms of recommendation performance and memory cost. We propose a method to minimize the embedding size selection regret on both user and item sides in a non-stationary manner.
arXiv Detail & Related papers (2023-08-15T13:27:18Z)
Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems. However, RL approaches are intractable in the slate recommendation scenario. In that setting, an action corresponds to a slate that may contain any combination of items. In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z)
Fast Offline Policy Optimization for Large Scale Recommendation [74.78213147859236]
We derive an approximation of these policy learning algorithms that scale logarithmically with the catalogue size. Our contribution is based upon combining three novel ideas. Our estimator is an order of magnitude faster than naive approaches yet produces equally good policies.
arXiv Detail & Related papers (2022-08-08T11:54:11Z)
Contextual Bandits with Smooth Regret: Efficient Learning in Continuous Action Spaces [14.366265951396587]
We design efficient general-purpose contextual bandit algorithms for large -- or even continuous -- action spaces. We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives. Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret.
arXiv Detail & Related papers (2022-07-12T21:27:09Z)
Control Variates for Slate Off-Policy Evaluation [112.35528337130118]
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions. We obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators.
arXiv Detail & Related papers (2021-06-15T06:59:53Z)
Learning User Preferences in Non-Stationary Environments [42.785926822853746]
We introduce a novel model for online non-stationary recommendation systems. We show that our algorithm outperforms other static algorithms even when preferences do not change over time.
arXiv Detail & Related papers (2021-01-29T10:26:16Z)
Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z)
Learning from eXtreme Bandit Feedback [105.0383130431503]
We study the problem of batch learning from bandit feedback in the setting of extremely large action spaces. In this paper, we introduce a selective importance sampling estimator (sIS) that operates in a significantly more favorable bias-variance regime. We employ this estimator in a novel algorithmic procedure -- named Policy Optimization for eXtreme Models (POXM) -- for learning from bandit feedback on XMC tasks.
arXiv Detail & Related papers (2020-09-27T20:47:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.