Experts with Lower-Bounded Loss Feedback: A Unifying Framework
- URL: http://arxiv.org/abs/2012.09537v1
- Date: Thu, 17 Dec 2020 12:18:52 GMT
- Title: Experts with Lower-Bounded Loss Feedback: A Unifying Framework
- Authors: Eyal Gofer and Guy Gilboa
- Abstract summary: We prove optimal regret bounds for modified versions of Exp3, generalizing algorithms and bounds both for the bandit and the full-information settings.
Our results intersect with those for bandits with graph-structured feedback, in that both settings can accommodate feedback from an arbitrary subset of experts on each round.
Our model also accommodates partial feedback at the single-expert level, by allowing non-trivial lower bounds on each loss.
- Score: 8.947188600472256
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The most prominent feedback models for the best expert problem are the full
information and bandit models. In this work we consider a simple feedback model
that generalizes both, where on every round, in addition to a bandit feedback,
the adversary provides a lower bound on the loss of each expert. Such lower
bounds may be obtained in various scenarios, for instance, in stock trading or
in assessing errors of certain measurement devices. For this model we prove
optimal regret bounds (up to logarithmic factors) for modified versions of
Exp3, generalizing algorithms and bounds both for the bandit and the
full-information settings. Our second-order unified regret analysis simulates a
two-step loss update and highlights three Hessian or Hessian-like expressions,
which map to the full-information regret, bandit regret, and a hybrid of both.
Our results intersect with those for bandits with graph-structured feedback, in
that both settings can accommodate feedback from an arbitrary subset of experts
on each round. However, our model also accommodates partial feedback at the
single-expert level, by allowing non-trivial lower bounds on each loss.
Related papers
- Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms.
We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z) - Non-stochastic Bandits With Evolving Observations [47.61533665679308]
We introduce a novel online learning framework that unifies and generalizes pre-established models.
We propose regret minimization algorithms for both the full-information and bandit settings.
Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.
arXiv Detail & Related papers (2024-05-27T05:32:46Z) - Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score.
We propose a rich class of generalized linear dueling bandit models, which cover many existing models.
Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z) - Doubly Adversarial Federated Bandits [7.23389716633927]
We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network.
Our algorithm gives a positive answer to an open question proposed in Cesa-Bianchi et al.
arXiv Detail & Related papers (2023-01-22T22:36:43Z) - Offline congestion games: How feedback type affects data coverage requirement [53.83345471268163]
We consider three different types of feedback with decreasing revealed information.
We show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting.
Our results constitute the first study of offline congestion games.
arXiv Detail & Related papers (2022-10-24T16:49:16Z) - Second Order Regret Bounds Against Generalized Expert Sequences under
Partial Bandit Feedback [0.0]
We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm.
Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner.
arXiv Detail & Related papers (2022-04-13T22:48:12Z) - Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased.
We propose a novel approach for dueling bandits based on information-directed sampling (IDS)
Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.