Leveraging User-Triggered Supervision in Contextual Bandits
- URL: http://arxiv.org/abs/2302.03784v1
- Date: Tue, 7 Feb 2023 22:42:27 GMT
- Title: Leveraging User-Triggered Supervision in Contextual Bandits
- Authors: Alekh Agarwal, Claudio Gentile, Teodor V. Marinov
- Abstract summary: We study contextual bandit (CB) problems, where the user can sometimes respond with the best action in a given context.
We develop a new framework to leverage such signals, while being robust to their biased nature.
- Score: 34.58466163463977
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study contextual bandit (CB) problems, where the user can sometimes
respond with the best action in a given context. Such an interaction arises,
for example, in text prediction or autocompletion settings, where a poor
suggestion is simply ignored and the user enters the desired text instead.
Crucially, this extra feedback is user-triggered on only a subset of the
contexts. We develop a new framework to leverage such signals, while being
robust to their biased nature. We also augment standard CB algorithms to
leverage the signal, and show improved regret guarantees for the resulting
algorithms under a variety of conditions on the helpfulness of and bias
inherent in this feedback.
Related papers
- Contextual Bandit with Herding Effects: Algorithms and Recommendation Applications [17.865143559133994]
"Herding effects" bias user feedback toward historical ratings, breaking down the assumption of unbiased feedback inherent in contextual bandits.
This paper develops a novel variant of the contextual bandit that is tailored to address the feedback bias caused by the herding effects.
We show that TS-Conf effectively mitigates the negative impact of herding effects, resulting in faster learning and improved recommendation accuracy.
arXiv Detail & Related papers (2024-08-26T17:20:34Z) - Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Few-Shot Adversarial Prompt Learning on Vision-Language Models [62.50622628004134]
The vulnerability of deep neural networks to imperceptible adversarial perturbations has attracted widespread attention.
Previous efforts achieved zero-shot adversarial robustness by aligning adversarial visual features with text supervision.
We propose a few-shot adversarial prompt framework where adapting input sequences with limited data makes significant adversarial robustness improvement.
arXiv Detail & Related papers (2024-03-21T18:28:43Z) - Follow-ups Also Matter: Improving Contextual Bandits via Post-serving
Contexts [31.33919659549256]
We present a novel contextual bandit problem with post-serving contexts.
Our algorithm, poLinUCB, achieves tight regret under standard assumptions.
Extensive empirical tests on both synthetic and real-world datasets demonstrate the significant benefit of utilizing post-serving contexts.
arXiv Detail & Related papers (2023-09-25T06:22:28Z) - Kernelized Offline Contextual Dueling Bandits [15.646879026749168]
In this work, we take advantage of the fact that often the agent can choose contexts at which to obtain human feedback.
We give an upper-confidence-bound style algorithm for this setting and prove a regret bound.
arXiv Detail & Related papers (2023-07-21T01:17:31Z) - Towards Robust Speech-to-Text Adversarial Attack [78.5097679815944]
This paper introduces a novel adversarial algorithm for attacking the state-of-the-art speech-to-text systems, namely DeepSpeech, Kaldi, and Lingvo.
Our approach is based on developing an extension for the conventional distortion condition of the adversarial optimization formulation.
Minimizing over this metric, which measures the discrepancies between original and adversarial samples' distributions, contributes to crafting signals very close to the subspace of legitimate speech recordings.
arXiv Detail & Related papers (2021-03-15T01:51:41Z) - Confidence-Budget Matching for Sequential Budgeted Learning [69.77435313099366]
We formalize decision-making problems with querying budget.
We consider multi-armed bandits, linear bandits, and reinforcement learning problems.
We show that CBM based algorithms perform well in the presence of adversity.
arXiv Detail & Related papers (2021-02-05T19:56:31Z) - Greedy Bandits with Sampled Context [0.0]
Greedy Bandits with Sampled Context (GB-SC) is a method for contextual multi-armed bandits to develop the prior from context information.
Our results show competitive performance on the Mushroom environment in terms of expected regret and expected cumulative regret.
arXiv Detail & Related papers (2020-07-27T17:17:45Z) - Online learning with Corrupted context: Corrupted Contextual Bandits [19.675277307158435]
We consider a novel variant of the contextual bandit problem.
This problem is motivated by certain on-line settings including clinical trial and ad recommendation applications.
We propose to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism.
arXiv Detail & Related papers (2020-06-26T19:53:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.