Related papers: Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

Deep Bayesian Bandits: Exploring in Online Personalized Recommendations

URL: http://arxiv.org/abs/2008.00727v1
Date: Mon, 3 Aug 2020 08:58:18 GMT
Title: Deep Bayesian Bandits: Exploring in Online Personalized Recommendations
Authors: Dalin Guo, Sofia Ira Ktena, Ferenc Huszar, Pranay Kumar Myana, Wenzhe Shi, Alykhan Tejani
Abstract summary: We formulate a display advertising recommender as a contextual bandit. We implement exploration techniques that require sampling from the posterior distribution of click-through-rates. We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting.
Score: 4.845576821204241
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recommender systems trained in a continuous learning fashion are plagued by the feedback loop problem, also known as algorithmic bias. This causes a newly trained model to act greedily and favor items that have already been engaged by users. This behavior is particularly harmful in personalised ads recommendations, as it can also cause new campaigns to remain unexplored. Exploration aims to address this limitation by providing new information about the environment, which encompasses user preference, and can lead to higher long-term reward. In this work, we formulate a display advertising recommender as a contextual bandit and implement exploration techniques that require sampling from the posterior distribution of click-through-rates in a computationally tractable manner. Traditional large-scale deep learning models do not provide uncertainty estimates by default. We approximate these uncertainty measurements of the predictions by employing a bootstrapped model with multiple heads and dropout units. We benchmark a number of different models in an offline simulation environment using a publicly available dataset of user-ads engagements. We test our proposed deep Bayesian bandits algorithm in the offline simulation and online AB setting with large-scale production traffic, where we demonstrate a positive gain of our exploration model.

Related papers

Epistemic Uncertainty-aware Recommendation Systems via Bayesian Deep Ensemble Learning [2.3310092106321365]
We propose an ensemble-based supermodel to generate more robust and reliable predictions. We also introduce a new interpretable non-linear matching approach for the user and item embeddings.
arXiv Detail & Related papers (2025-04-14T23:04:35Z)
Generative Edge Detection with Stable Diffusion [52.870631376660924]
Edge detection is typically viewed as a pixel-level classification problem mainly addressed by discriminative methods. We propose a novel approach, named Generative Edge Detector (GED), by fully utilizing the potential of the pre-trained stable diffusion model. We conduct extensive experiments on multiple datasets and achieve competitive performance.
arXiv Detail & Related papers (2024-10-04T01:52:23Z)
Mitigating Exposure Bias in Online Learning to Rank Recommendation: A Novel Reward Model for Cascading Bandits [23.15042648884445]
We study exposure bias in a class of well-known contextual bandit algorithms known as Linear Cascading Bandits. We propose an Exposure-Aware reward model that updates the model parameters based on two factors: 1) implicit user feedback and 2) the position of the item in the recommendation list.
arXiv Detail & Related papers (2024-08-08T09:35:01Z)
Deep Bayesian Active Learning for Preference Modeling in Large Language Models [84.817400962262]
We propose the Bayesian Active Learner for Preference Modeling (BAL-PM) for Preference Modeling. BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous Bayesian acquisition policies. Our experiments demonstrate that BAL-PM requires 33% to 68% fewer preference labels in two popular human preference datasets and exceeds previous Bayesian acquisition policies.
arXiv Detail & Related papers (2024-06-14T13:32:43Z)
Posterior Sampling via Autoregressive Generation [11.713451719120707]
We propose a new framework for learning bandit algorithms from massive historical data. We use historical data to pretrain an autoregressive model to predict a sequence of repeated feedback/rewards. At decision-time, we autoregressively sample (impute) an imagined sequence of rewards for each action, and choose the action with the largest average imputed reward.
arXiv Detail & Related papers (2024-05-29T19:24:44Z)
Personalized Negative Reservoir for Incremental Learning in Recommender Systems [22.227137206517142]
Recommender systems have become an integral part of online platforms. Every day the volume of training data is expanding and the number of user interactions is constantly increasing. The exploration of larger and more expressive models has become a necessary pursuit to improve user experience.
arXiv Detail & Related papers (2024-03-06T19:08:28Z)
Generative Slate Recommendation with Reinforcement Learning [49.75985313698214]
reinforcement learning algorithms can be used to optimize user engagement in recommender systems. However, RL approaches are intractable in the slate recommendation scenario. In that setting, an action corresponds to a slate that may contain any combination of items. In this work we propose to encode slates in a continuous, low-dimensional latent space learned by a variational auto-encoder. We are able to (i) relax assumptions required by previous work, and (ii) improve the quality of the action selection by modeling full slates.
arXiv Detail & Related papers (2023-01-20T15:28:09Z)
Rethinking Missing Data: Aleatoric Uncertainty-Aware Recommendation [59.500347564280204]
We propose a new Aleatoric Uncertainty-aware Recommendation (AUR) framework. AUR consists of a new uncertainty estimator along with a normal recommender model. As the chance of mislabeling reflects the potential of a pair, AUR makes recommendations according to the uncertainty.
arXiv Detail & Related papers (2022-09-22T04:32:51Z)
PURS: Personalized Unexpected Recommender System for Improving User Satisfaction [76.98616102965023]
We describe a novel Personalized Unexpected Recommender System (PURS) model that incorporates unexpectedness into the recommendation process. Extensive offline experiments on three real-world datasets illustrate that the proposed PURS model significantly outperforms the state-of-the-art baseline approaches.
arXiv Detail & Related papers (2021-06-05T01:33:21Z)
Non-Stationary Latent Bandits [68.21614490603758]
We propose a practical approach for fast personalization to non-stationary users. The key idea is to frame this problem as a latent bandit, where prototypical models of user behavior are learned offline and the latent state of the user is inferred online. We propose Thompson sampling algorithms for regret minimization in non-stationary latent bandits, analyze them, and evaluate them on a real-world dataset.
arXiv Detail & Related papers (2020-12-01T10:31:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.