Bandit-Feedback Online Multiclass Classification: Variants and Tradeoffs
- URL: http://arxiv.org/abs/2402.07453v1
- Date: Mon, 12 Feb 2024 07:20:05 GMT
- Title: Bandit-Feedback Online Multiclass Classification: Variants and Tradeoffs
- Authors: Yuval Filmus, Steve Hanneke, Idan Mehalel and Shay Moran
- Abstract summary: We show that the optimal mistake bound under bandit feedback is at most $O(k)$ times higher than the optimal mistake bound in the full information case.
We also present nearly optimal bounds of $tildeTheta(k)$ on the gap between randomized and deterministic learners.
- Score: 32.29254118429081
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Consider the domain of multiclass classification within the adversarial
online setting. What is the price of relying on bandit feedback as opposed to
full information? To what extent can an adaptive adversary amplify the loss
compared to an oblivious one? To what extent can a randomized learner reduce
the loss compared to a deterministic one? We study these questions in the
mistake bound model and provide nearly tight answers.
We demonstrate that the optimal mistake bound under bandit feedback is at
most $O(k)$ times higher than the optimal mistake bound in the full information
case, where $k$ represents the number of labels. This bound is tight and
provides an answer to an open question previously posed and studied by Daniely
and Helbertal ['13] and by Long ['17, '20], who focused on deterministic
learners.
Moreover, we present nearly optimal bounds of $\tilde{\Theta}(k)$ on the gap
between randomized and deterministic learners, as well as between adaptive and
oblivious adversaries in the bandit feedback setting. This stands in contrast
to the full information scenario, where adaptive and oblivious adversaries are
equivalent, and the gap in mistake bounds between randomized and deterministic
learners is a constant multiplicative factor of $2$.
In addition, our results imply that in some cases the optimal randomized
mistake bound is approximately the square-root of its deterministic parallel.
Previous results show that this is essentially the smallest it can get.
Related papers
- Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback [58.66941279460248]
Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM)
We study a model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary.
We propose an algorithm namely robust contextual dueling bandit (algo), which is based on uncertainty-weighted maximum likelihood estimation.
arXiv Detail & Related papers (2024-04-16T17:59:55Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Optimal Prediction Using Expert Advice and Randomized Littlestone
Dimension [32.29254118429081]
A classical result characterizes the optimal mistake bound achievable by deterministic learners using the Littlestone dimension.
We show that the optimal expected mistake bound in learning a class $mathcalH$ equals its randomized Littlestone dimension.
arXiv Detail & Related papers (2023-02-27T14:50:34Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - The price of unfairness in linear bandits with biased feedback [62.25313751895011]
We study the problem of sequential decision making with biased linear bandit feedback.
We show that the worst case regret is higher than the dT 1/2 log(T) regret rate obtained under unbiased feedback.
Interestingly, the gap-dependent rates reveal the existence of non-trivial instances where the problem is no more difficult than its unbiased counterpart.
arXiv Detail & Related papers (2022-03-18T08:03:20Z) - Online Selective Classification with Limited Feedback [82.68009460301585]
We study selective classification in the online learning model, wherein a predictor may abstain from classifying an instance.
Two salient aspects of the setting we consider are that the data may be non-realisable, due to which abstention may be a valid long-term action.
We construct simple versioning-based schemes for any $mu in (0,1],$ that make most $Tmu$ mistakes while incurring smash$tildeO(T1-mu)$ excess abstention against adaptive adversaries.
arXiv Detail & Related papers (2021-10-27T08:00:53Z) - Linear Contextual Bandits with Adversarial Corruptions [91.38793800392108]
We study the linear contextual bandit problem in the presence of adversarial corruption.
We present a variance-aware algorithm that is adaptive to the level of adversarial contamination $C$.
arXiv Detail & Related papers (2021-10-25T02:53:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.