Exploiting Correlated Auxiliary Feedback in Parameterized Bandits
- URL: http://arxiv.org/abs/2311.02715v1
- Date: Sun, 5 Nov 2023 17:27:06 GMT
- Title: Exploiting Correlated Auxiliary Feedback in Parameterized Bandits
- Authors: Arun Verma, Zhongxiang Dai, Yao Shu, Bryan Kian Hsiang Low
- Abstract summary: We study a novel variant of the parameterized bandits problem in which the learner can observe additional auxiliary feedback that is correlated with the observed reward.
The auxiliary feedback is readily available in many real-life applications, e.g., an online platform that wants to recommend the best-rated services to its users can observe the user's rating of service (rewards) and collect additional information like service delivery time (auxiliary feedback)
- Score: 56.84649080789685
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a novel variant of the parameterized bandits problem in which the
learner can observe additional auxiliary feedback that is correlated with the
observed reward. The auxiliary feedback is readily available in many real-life
applications, e.g., an online platform that wants to recommend the best-rated
services to its users can observe the user's rating of service (rewards) and
collect additional information like service delivery time (auxiliary feedback).
In this paper, we first develop a method that exploits auxiliary feedback to
build a reward estimator with tight confidence bounds, leading to a smaller
regret. We then characterize the regret reduction in terms of the correlation
coefficient between reward and its auxiliary feedback. Experimental results in
different settings also verify the performance gain achieved by our proposed
method.
Related papers
- Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation [67.88747330066049]
Fine-grained feedback captures nuanced distinctions in image quality and prompt-alignment.
We show that demonstrating its superiority to coarse-grained feedback is not automatic.
We identify key challenges in eliciting and utilizing fine-grained feedback.
arXiv Detail & Related papers (2024-06-24T17:19:34Z) - Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs [57.16442740983528]
In ad-hoc retrieval, evaluation relies heavily on user actions, including implicit feedback.
The role of user feedback in annotators' assessment of turns in a conversational perception has been little studied.
We focus on how the evaluation of task-oriented dialogue systems ( TDSs) is affected by considering user feedback, explicit or implicit, as provided through the follow-up utterance of a turn being evaluated.
arXiv Detail & Related papers (2024-04-19T16:45:50Z) - Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from
Imperfect Demonstration for Interactive Recommendation [23.048841953423846]
We focus on the problem of learning to reward, which is fundamental to reinforcement learning.
Previous approaches either introduce additional procedures for learning to reward, thereby increasing the complexity of optimization.
We propose a novel batch inverse reinforcement learning paradigm that achieves the desired properties.
arXiv Detail & Related papers (2023-10-30T13:43:20Z) - Continually Improving Extractive QA via Human Feedback [59.49549491725224]
We study continually improving an extractive question answering (QA) system via human user feedback.
We conduct experiments involving thousands of user interactions under diverse setups to broaden the understanding of learning from feedback over time.
arXiv Detail & Related papers (2023-05-21T14:35:32Z) - Breaking Feedback Loops in Recommender Systems with Causal Inference [99.22185950608838]
Recent work has shown that feedback loops may compromise recommendation quality and homogenize user behavior.
We propose the Causal Adjustment for Feedback Loops (CAFL), an algorithm that provably breaks feedback loops using causal inference.
We show that CAFL improves recommendation quality when compared to prior correction methods.
arXiv Detail & Related papers (2022-07-04T17:58:39Z) - Simulating Bandit Learning from User Feedback for Extractive Question
Answering [51.97943858898579]
We study learning from user feedback for extractive question answering by simulating feedback using supervised data.
We show that systems initially trained on a small number of examples can dramatically improve given feedback from users on model-predicted answers.
arXiv Detail & Related papers (2022-03-18T17:47:58Z) - User and Item-aware Estimation of Review Helpfulness [4.640835690336653]
We investigate the role of deviations in the properties of reviews as helpfulness determinants.
We propose a novel helpfulness estimation model that extends previous ones.
Our model is thus an effective tool to select relevant user feedback for decision-making.
arXiv Detail & Related papers (2020-11-20T15:35:56Z) - Counterfactual Evaluation of Slate Recommendations with Sequential
Reward Interactions [18.90946044396516]
Music streaming, video streaming, news recommendation, and e-commerce services often engage with content in a sequential manner.
Providing and evaluating good sequences of recommendations is therefore a central problem for these services.
We propose a new counterfactual estimator that allows for sequential interactions in the rewards with lower variance in anally unbiased manner.
arXiv Detail & Related papers (2020-07-25T17:58:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.