Related papers: Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications

Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications

URL: http://arxiv.org/abs/2408.14432v2
Date: Wed, 28 Aug 2024 12:39:57 GMT
Title: Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications
Authors: Luyue Xu, Liming Wang, Hong Xie, Mingqiang Zhou,
Abstract summary: "Herding effects" bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. We show that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.
Score: 17.865143559133994
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Contextual bandits serve as a fundamental algorithmic framework for optimizing recommendation decisions online. Though extensive attention has been paid to tailoring contextual bandits for recommendation applications, the "herding effects" in user feedback have been ignored. These herding effects bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. A user feedback model is formulated to capture this feedback bias. We design the TS-Conf (Thompson Sampling under Conformity) algorithm, which employs posterior sampling to balance the exploration and exploitation tradeoff. We prove an upper bound for the regret of the algorithm, revealing the impact of herding effects on learning speed. Extensive experiments on datasets demonstrate that TS-Conf outperforms four benchmark algorithms. Analysis reveals that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.

Related papers

Neural Contextual Bandits Under Delayed Feedback Constraints [3.823356975862005]
This paper presents a new algorithm for neural contextual bandits (CBs) that addresses the challenge of delayed reward feedback. The proposed algorithm, called Delayed NeuralUCB, uses an upper confidence bound (UCB)-based exploration strategy. Numerical experiments on real-world datasets, such as MNIST and Mushroom, demonstrate that the proposed algorithms effectively manage varying delays.
arXiv Detail & Related papers (2025-04-16T13:47:25Z)
Interactive Visualization Recommendation with Hier-SUCB [52.11209329270573]
We propose an interactive personalized visualization recommendation (PVisRec) system that learns on user feedback from previous interactions. For more interactive and accurate recommendations, we propose Hier-SUCB, a contextual semi-bandit in the PVisRec setting.
arXiv Detail & Related papers (2025-02-05T17:14:45Z)
Algorithmic Drift: A Simulation Framework to Study the Effects of Recommender Systems on User Preferences [7.552217586057245]
We propose a simulation framework that mimics user-recommender system interactions in a long-term scenario. We introduce two novel metrics for quantifying the algorithm's impact on user preferences, specifically in terms of drift over time.
arXiv Detail & Related papers (2024-09-24T21:54:22Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
DPR: An Algorithm Mitigate Bias Accumulation in Recommendation feedback loops [41.21024436158042]
We study the negative impact of feedback loops and unknown exposure mechanisms on recommendation quality and user experience. We propose Dynamic Personalized Ranking (textbfDPR), an unbiased algorithm that uses dynamic re-weighting to mitigate the cross-effects. We show theoretically that our approach mitigates the negative effects of feedback loops and unknown exposure mechanisms.
arXiv Detail & Related papers (2023-11-10T04:36:00Z)
Breaking Feedback Loops in Recommender Systems with Causal Inference [99.22185950608838]
Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior. We propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference. We show that CAFL improves recommendation quality when compared to prior correction methods.
arXiv Detail & Related papers (2022-07-04T17:58:39Z)
Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR) CPR achieves unbiased recommendation without knowing the exposure mechanism. We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z)
Existence conditions for hidden feedback loops in online recommender systems [0.0]
We study how uncertainty and noise in user interests influence the existence of feedback loops. A non-zero probability of resetting user interests is sufficient to limit the feedback loop and estimate the size of the effect.
arXiv Detail & Related papers (2021-09-11T13:30:08Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Probabilistic and Variational Recommendation Denoising [56.879165033014026]
Learning from implicit feedback is one of the most common cases in the application of recommender systems. We propose probabilistic and variational recommendation denoising for implicit feedback. We employ the proposed DPI and DVAE on four state-of-the-art recommendation models and conduct experiments on three datasets.
arXiv Detail & Related papers (2021-05-20T08:59:44Z)
Learning Multiclass Classifier Under Noisy Bandit Feedback [6.624726878647541]
We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique. We show our approach's effectiveness using extensive experiments on several benchmark datasets.
arXiv Detail & Related papers (2020-06-05T16:31:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.