Related papers: Experts with Lower-Bounded Loss Feedback: A Unifying Framework

Experts with Lower-Bounded Loss Feedback: A Unifying Framework

URL: http://arxiv.org/abs/2012.09537v1
Date: Thu, 17 Dec 2020 12:18:52 GMT
Title: Experts with Lower-Bounded Loss Feedback: A Unifying Framework
Authors: Eyal Gofer and Guy Gilboa
Abstract summary: We prove optimal regret bounds for modified versions of Exp3, generalizing algorithms and bounds both for the bandit and the full-information settings. Our results intersect with those for bandits with graph-structured feedback, in that both settings can accommodate feedback from an arbitrary subset of experts on each round. Our model also accommodates partial feedback at the single-expert level, by allowing non-trivial lower bounds on each loss.
Score: 8.947188600472256
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The most prominent feedback models for the best expert problem are the full information and bandit models. In this work we consider a simple feedback model that generalizes both, where on every round, in addition to a bandit feedback, the adversary provides a lower bound on the loss of each expert. Such lower bounds may be obtained in various scenarios, for instance, in stock trading or in assessing errors of certain measurement devices. For this model we prove optimal regret bounds (up to logarithmic factors) for modified versions of Exp3, generalizing algorithms and bounds both for the bandit and the full-information settings. Our second-order unified regret analysis simulates a two-step loss update and highlights three Hessian or Hessian-like expressions, which map to the full-information regret, bandit regret, and a hybrid of both. Our results intersect with those for bandits with graph-structured feedback, in that both settings can accommodate feedback from an arbitrary subset of experts on each round. However, our model also accommodates partial feedback at the single-expert level, by allowing non-trivial lower bounds on each loss.

Related papers

Fusing Reward and Dueling Feedback in Stochastic Bandits [37.151617544329895]
We investigate the fusion of absolute (reward) and relative (dueling) feedback in bandits, where both feedback types are gathered in each decision round. We derive a regret lower bound, demonstrating that an efficient algorithm may incur only the smaller among the reward and dueling-based regret for each individual arm.
arXiv Detail & Related papers (2025-04-22T11:51:20Z)
Neural Dueling Bandits [58.90189511247936]
We use a neural network to estimate the reward function using preference feedback for the previously selected arms. We then extend our theoretical results to contextual bandit problems with binary feedback, which is in itself a non-trivial contribution.
arXiv Detail & Related papers (2024-07-24T09:23:22Z)
Non-stochastic Bandits With Evolving Observations [47.61533665679308]
We introduce a novel online learning framework that unifies and generalizes pre-established models. We propose regret minimization algorithms for both the full-information and bandit settings. Our algorithms match the known regret bounds across many special cases, while also introducing previously unknown bounds.
arXiv Detail & Related papers (2024-05-27T05:32:46Z)
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM) We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z)
Borda Regret Minimization for Generalized Linear Dueling Bandits [65.09919504862496]
We study the Borda regret minimization problem for dueling bandits, which aims to identify the item with the highest Borda score. We propose a rich class of generalized linear dueling bandit models, which cover many existing models. Our algorithm achieves an $tildeO(d2/3 T2/3)$ regret, which is also optimal.
arXiv Detail & Related papers (2023-03-15T17:59:27Z)
Doubly Adversarial Federated Bandits [7.23389716633927]
We study a new non-stochastic federated multi-armed bandit problem with multiple agents collaborating via a communication network. Our algorithm gives a positive answer to an open question proposed in Cesa-Bianchi et al.
arXiv Detail & Related papers (2023-01-22T22:36:43Z)
Offline congestion games: How feedback type affects data coverage requirement [53.83345471268163]
We consider three different types of feedback with decreasing revealed information. We show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting. Our results constitute the first study of offline congestion games.
arXiv Detail & Related papers (2022-10-24T16:49:16Z)
Second Order Regret Bounds Against Generalized Expert Sequences under Partial Bandit Feedback [0.0]
We study the problem of expert advice under partial bandit feedback setting and create a sequential minimax optimal algorithm. Our algorithm works with a more general partial monitoring setting, where, in contrast to the classical bandit feedback, the losses can be revealed in an adversarial manner.
arXiv Detail & Related papers (2022-04-13T22:48:12Z)
Bias-Robust Bayesian Optimization via Dueling Bandit [57.82422045437126]
We consider Bayesian optimization in settings where observations can be adversarially biased. We propose a novel approach for dueling bandits based on information-directed sampling (IDS) Thereby, we obtain the first efficient kernelized algorithm for dueling bandits that comes with cumulative regret guarantees.
arXiv Detail & Related papers (2021-05-25T10:08:41Z)
Bandits with Knapsacks beyond the Worst-Case [87.54497614804409]
We present three results that go beyond the worst-case perspective. First, we provide upper and lower bounds which amount to a full characterization for logarithmic, instance-dependent regret rates. Second, we consider "simple regret" in BwK, which tracks algorithm's performance in a given round, and prove that it is small in all but a few rounds.
arXiv Detail & Related papers (2020-02-01T18:50:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.