Bayesian Non-stationary Linear Bandits for Large-Scale Recommender
Systems
- URL: http://arxiv.org/abs/2202.03167v2
- Date: Mon, 24 Jul 2023 22:56:21 GMT
- Title: Bayesian Non-stationary Linear Bandits for Large-Scale Recommender
Systems
- Authors: Saeed Ghoorchian, Evgenii Kortukov, Setareh Maghsudi
- Abstract summary: We build upon the linear contextual multi-armed bandit framework to address this problem.
We develop a decision-making policy for a linear bandit problem with high-dimensional feature vectors.
Our proposed recommender system employs this policy to learn the users' item preferences online while minimizing runtime.
- Score: 6.009759445555003
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Taking advantage of contextual information can potentially boost the
performance of recommender systems. In the era of big data, such side
information often has several dimensions. Thus, developing decision-making
algorithms to cope with such a high-dimensional context in real time is
essential. That is specifically challenging when the decision-maker has a
variety of items to recommend. In addition, changes in items' popularity or
users' preferences can hinder the performance of the deployed recommender
system due to a lack of robustness to distribution shifts in the environment.
In this paper, we build upon the linear contextual multi-armed bandit framework
to address this problem. We develop a decision-making policy for a linear
bandit problem with high-dimensional feature vectors, a large set of arms, and
non-stationary reward-generating processes. Our Thompson sampling-based policy
reduces the dimension of feature vectors using random projection and uses
exponentially increasing weights to decrease the influence of past observations
with time. Our proposed recommender system employs this policy to learn the
users' item preferences online while minimizing runtime. We prove a regret
bound that scales as a factor of the reduced dimension instead of the original
one. To evaluate our proposed recommender system numerically, we apply it to
three real-world datasets. The theoretical and numerical results demonstrate
the effectiveness of our proposed algorithm in making a trade-off between
computational complexity and regret performance compared to the
state-of-the-art.
Related papers
- A Recommendation Model Utilizing Separation Embedding and Self-Attention for Feature Mining [7.523158123940574]
Recommendation systems provide users with content that meets their needs.
Traditional click-through rate prediction and TOP-K recommendation mechanisms are unable to meet the recommendations needs.
This paper proposes a recommendations system model based on a separation embedding cross-network.
arXiv Detail & Related papers (2024-10-19T07:49:21Z) - Scalable Dynamic Embedding Size Search for Streaming Recommendation [54.28404337601801]
Real-world recommender systems often operate in streaming recommendation scenarios.
Number of users and items continues to grow, leading to substantial storage resource consumption.
We learn Lightweight Embeddings for streaming recommendation, called SCALL, which can adaptively adjust the embedding sizes of users/items.
arXiv Detail & Related papers (2024-07-22T06:37:24Z) - Dynamic Embedding Size Search with Minimum Regret for Streaming
Recommender System [39.78277554870799]
We show that setting an identical and static embedding size is sub-optimal in terms of recommendation performance and memory cost.
We propose a method to minimize the embedding size selection regret on both user and item sides in a non-stationary manner.
arXiv Detail & Related papers (2023-08-15T13:27:18Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Fast Offline Policy Optimization for Large Scale Recommendation [74.78213147859236]
We derive an approximation of these policy learning algorithms that scale logarithmically with the catalogue size.
Our contribution is based upon combining three novel ideas.
Our estimator is an order of magnitude faster than naive approaches yet produces equally good policies.
arXiv Detail & Related papers (2022-08-08T11:54:11Z) - Contextual Bandits with Smooth Regret: Efficient Learning in Continuous
Action Spaces [14.366265951396587]
We design efficient general-purpose contextual bandit algorithms for large -- or even continuous -- action spaces.
We propose a smooth regret notion for contextual bandits, which dominates previously proposed alternatives.
Our algorithms can be used to recover the previous minimax/Pareto optimal guarantees under the standard regret.
arXiv Detail & Related papers (2022-07-12T21:27:09Z) - Control Variates for Slate Off-Policy Evaluation [112.35528337130118]
We study the problem of off-policy evaluation from batched contextual bandit data with multidimensional actions.
We obtain new estimators with risk improvement guarantees over both the PI and self-normalized PI estimators.
arXiv Detail & Related papers (2021-06-15T06:59:53Z) - Learning User Preferences in Non-Stationary Environments [42.785926822853746]
We introduce a novel model for online non-stationary recommendation systems.
We show that our algorithm outperforms other static algorithms even when preferences do not change over time.
arXiv Detail & Related papers (2021-01-29T10:26:16Z) - Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users.
The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online.
We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z) - Learning from eXtreme Bandit Feedback [105.0383130431503]
We study the problem of batch learning from bandit feedback in the setting of extremely large action spaces.
In this paper, we introduce a selective importance sampling estimator (sIS) that operates in a significantly more favorable bias-variance regime.
We employ this estimator in a novel algorithmic procedure -- named Policy Optimization for eXtreme Models (POXM) -- for learning from bandit feedback on XMC tasks.
arXiv Detail & Related papers (2020-09-27T20:47:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.