Related papers: Leveraging User-Triggered Supervision in Contextual Bandits

Leveraging User-Triggered Supervision in Contextual Bandits

URL: http://arxiv.org/abs/2302.03784v1
Date: Tue, 7 Feb 2023 22:42:27 GMT
Title: Leveraging User-Triggered Supervision in Contextual Bandits
Authors: Alekh Agarwal, Claudio Gentile, Teodor V. Marinov
Abstract summary: We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. We develop a new framework to leverage such signals, while being robust to their biased nature.
Score: 34.58466163463977
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context. Such an interaction arises, for example, in text prediction or autocompletion settings, where a poor suggestion is simply ignored and the user enters the desired text instead. Crucially, this extra feedback is user-triggered on only a subset of the contexts. We develop a new framework to leverage such signals, while being robust to their biased nature. We also augment standard CB algorithms to leverage the signal, and show improved regret guarantees for the resulting algorithms under a variety of conditions on the helpfulness of and bias inherent in this feedback.

Related papers

Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications [17.865143559133994]
"Herding effects" bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits. This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects. We show that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.
arXiv Detail & Related papers (2024-08-26T17:20:34Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Beyond Thumbs Up/Down: Untangling Challenges of Fine-Grained Feedback for Text-to-Image Generation [67.88747330066049]
Fine-grained feedback captures nuanced distinctions in image quality and prompt-alignment. We show that demonstrating its superiority to coarse-grained feedback is not automatic. We identify key challenges in eliciting and utilizing fine-grained feedback.
arXiv Detail & Related papers (2024-06-24T17:19:34Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Few-Shot Adversarial Prompt Learning on Vision-Language Models [62.50622628004134]
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention. Previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision. We propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.
arXiv Detail & Related papers (2024-03-21T18:28:43Z)
Follow-ups Also Matter: Improving Contextual Bandits via Post-serving Contexts [31.33919659549256]
We present a novel contextual bandit problem with post-serving contexts. Our algorithm, poLinUCB, achieves tight regret under standard assumptions. Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
arXiv Detail & Related papers (2023-09-25T06:22:28Z)
Kernelized Offline Contextual Dueling Bandits [15.646879026749168]
In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback. We give an upper-confidence-bound style algorithm for this setting and prove a regret bound.
arXiv Detail & Related papers (2023-07-21T01:17:31Z)
Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo. Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation. Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z)
Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget. We consider multi-armed bandits, linear bandits, and reinforcement learning problems. We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z)
Greedy Bandits with Sampled Context [0.0]
Greedy Bandits with Sampled Context (GB-SC) is a method for contextual multi-armed bandits to develop the prior from context information. Our results show competitive performance on the Mushroom environment in terms of expected regret and expected cumulative regret.
arXiv Detail & Related papers (2020-07-27T17:17:45Z)
Online learning with Corrupted context: Corrupted Contextual Bandits [19.675277307158435]
We consider a novel variant of the contextual bandit problem. This problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. We propose to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism.
arXiv Detail & Related papers (2020-06-26T19:53:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.