Deep Bayesian Bandits: Exploring in Online Personalized Recommendations
- URL: http://arxiv.org/abs/2008.00727v1
- Date: Mon, 3 Aug 2020 08:58:18 GMT
- Title: Deep Bayesian Bandits: Exploring in Online Personalized Recommendations
- Authors: Dalin Guo, Sofia Ira Ktena, Ferenc Huszar, Pranay Kumar Myana, Wenzhe
Shi, Alykhan Tejani
- Abstract summary: We formulate a display advertising recommender as a contextual bandit.
We implement exploration techniques that require sampling from the posterior distribution of click-through-rates.
We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting.
- Score: 4.845576821204241
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recommender systems trained in a continuous learning fashion are plagued by
the feedback loop problem, also known as algorithmic bias. This causes a newly
trained model to act greedily and favor items that have already been engaged by
users. This behavior is particularly harmful in personalised ads
recommendations, as it can also cause new campaigns to remain unexplored.
Exploration aims to address this limitation by providing new information about
the environment, which encompasses user preference, and can lead to higher
long-term reward. In this work, we formulate a display advertising recommender
as a contextual bandit and implement exploration techniques that require
sampling from the posterior distribution of click-through-rates in a
computationally tractable manner. Traditional large-scale deep learning models
do not provide uncertainty estimates by default. We approximate these
uncertainty measurements of the predictions by employing a bootstrapped model
with multiple heads and dropout units. We benchmark a number of different
models in an offline simulation environment using a publicly available dataset
of user-ads engagements. We test our proposed deep Bayesian bandits algorithm
in the offline simulation and online AB setting with large-scale production
traffic, where we demonstrate a positive gain of our exploration model.
Related papers
- Generative Edge Detection with Stable Diffusion [52.870631376660924]
Edge detection is typically viewed as a pixel-level classification problem mainly addressed by discriminative methods.
We propose a novel approach, named Generative Edge Detector (GED), by fully utilizing the potential of the pre-trained stable diffusion model.
We conduct extensive experiments on multiple datasets and achieve competitive performance.
arXiv Detail & Related papers (2024-10-04T01:52:23Z) - Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading Bandits [23.15042648884445]
We study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits.
We propose an Exposure-Aware reward model that updates the model parameters based on two factors: 1) implicit user feedback and 2) the position of the item in the recommendation list.
arXiv Detail & Related papers (2024-08-08T09:35:01Z) - Deep Bayesian Active Learning for Preference Modeling in Large Language Models [84.817400962262]
We propose the Bayesian Active Learner for Preference Modeling (BAL-PM) for Preference Modeling.
BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous Bayesian acquisition policies.
Our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous Bayesian acquisition policies.
arXiv Detail & Related papers (2024-06-14T13:32:43Z) - Posterior Sampling via Autoregressive Generation [11.713451719120707]
We propose a new framework for learning bandit algorithms from massive historical data.
We use historical data to pretrain an autoregressive model to predict a sequence of repeated feedback/rewards.
At decision-time, we autoregressively sample (impute) an imagined sequence of rewards for each action, and choose the action with the largest average imputed reward.
arXiv Detail & Related papers (2024-05-29T19:24:44Z) - Personalized Negative Reservoir for Incremental Learning in Recommender
Systems [22.227137206517142]
Recommender systems have become an integral part of online platforms.
Every day the volume of training data is expanding and the number of user interactions is constantly increasing.
The exploration of larger and more expressive models has become a necessary pursuit to improve user experience.
arXiv Detail & Related papers (2024-03-06T19:08:28Z) - Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems.
However, RL approaches are intractable in the slate recommendation scenario.
In that setting, an action corresponds to a slate that may contain any combination of items.
In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder.
We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z) - Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework.
AUR consists of a new uncertainty estimator along with a normal recommender model.
As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z) - PURS: Personalized Unexpected Recommender System for Improving User
Satisfaction [76.98616102965023]
We describe a novel Personalized Unexpected Recommender System (PURS) model that incorporates unexpectedness into the recommendation process.
Extensive offline experiments on three real-world datasets illustrate that the proposed PURS model significantly outperforms the state-of-the-art baseline approaches.
arXiv Detail & Related papers (2021-06-05T01:33:21Z) - Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users.
The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online.
We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.