Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications
- URL: http://arxiv.org/abs/2408.14432v2
- Date: Wed, 28 Aug 2024 12:39:57 GMT
- Title: Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications
- Authors: Luyue Xu, Liming Wang, Hong Xie, Mingqiang Zhou,
- Abstract summary: "Herding effects" bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits.
This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects.
We show that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.
- Score: 17.865143559133994
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contextual bandits serve as a fundamental algorithmic framework for optimizing recommendation decisions online. Though extensive attention has been paid to tailoring contextual bandits for recommendation applications, the "herding effects" in user feedback have been ignored. These herding effects bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. A user feedback model is formulated to capture this feedback bias. We design the TS-Conf (Thompson Sampling under Conformity) algorithm, which employs posterior sampling to balance the exploration and exploitation tradeoff. We prove an upper bound for the regret of the algorithm, revealing the impact of herding effects on learning speed. Extensive experiments on datasets demonstrate that TS-Conf outperforms four benchmark algorithms. Analysis reveals that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.
Related papers
- Algorithmic Drift: A Simulation Framework to Study the Effects of Recommender Systems on User Preferences [7.552217586057245]
We propose a simulation framework that mimics user-recommender system interactions in a long-term scenario.
We introduce two novel metrics for quantifying the algorithm's impact on user preferences, specifically in terms of drift over time.
arXiv Detail & Related papers (2024-09-24T21:54:22Z) - Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - DPR: An Algorithm Mitigate Bias Accumulation in Recommendation feedback
loops [41.21024436158042]
We study the negative impact of feedback loops and unknown exposure mechanisms on recommendation quality and user experience.
We propose Dynamic Personalized Ranking (textbfDPR), an unbiased algorithm that uses dynamic re-weighting to mitigate the cross-effects.
We show theoretically that our approach mitigates the negative effects of feedback loops and unknown exposure mechanisms.
arXiv Detail & Related papers (2023-11-10T04:36:00Z) - Breaking Feedback Loops in Recommender Systems with Causal Inference [99.22185950608838]
Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior.
We propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference.
We show that CAFL improves recommendation quality when compared to prior correction methods.
arXiv Detail & Related papers (2022-07-04T17:58:39Z) - Cross Pairwise Ranking for Unbiased Item Recommendation [57.71258289870123]
We develop a new learning paradigm named Cross Pairwise Ranking (CPR)
CPR achieves unbiased recommendation without knowing the exposure mechanism.
We prove in theory that this way offsets the influence of user/item propensity on the learning.
arXiv Detail & Related papers (2022-04-26T09:20:27Z) - Existence conditions for hidden feedback loops in online recommender
systems [0.0]
We study how uncertainty and noise in user interests influence the existence of feedback loops.
A non-zero probability of resetting user interests is sufficient to limit the feedback loop and estimate the size of the effect.
arXiv Detail & Related papers (2021-09-11T13:30:08Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z) - Probabilistic and Variational Recommendation Denoising [56.879165033014026]
Learning from implicit feedback is one of the most common cases in the application of recommender systems.
We propose probabilistic and variational recommendation denoising for implicit feedback.
We employ the proposed DPI and DVAE on four state-of-the-art recommendation models and conduct experiments on three datasets.
arXiv Detail & Related papers (2021-05-20T08:59:44Z) - Learning Multiclass Classifier Under Noisy Bandit Feedback [6.624726878647541]
We propose a novel approach to deal with noisy bandit feedback based on the unbiased estimator technique.
We show our approach's effectiveness using extensive experiments on several benchmark datasets.
arXiv Detail & Related papers (2020-06-05T16:31:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.