The Real Price of Bandit Information in Multiclass Classification
- URL: http://arxiv.org/abs/2405.10027v2
- Date: Wed, 19 Jun 2024 09:20:04 GMT
- Title: The Real Price of Bandit Information in Multiclass Classification
- Authors: Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran,
- Abstract summary: We revisit the classical problem of multiclass classification with bandit feedback.
We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
- Score: 73.17969992976501
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}\left(\min \left\{|H| + \sqrt{T}, \sqrt{KT \log |H|} \right\} \right) }$, where $H$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|H|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.
Related papers
- Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct.
Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z) - Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback.
We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Beyond Bandit Feedback in Online Multiclass Classification [17.07011090727996]
We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph.
We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs.
arXiv Detail & Related papers (2021-06-07T13:22:30Z) - Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems.
We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z) - Exploiting the Surrogate Gap in Online Multiclass Classification [13.452510519858995]
Gaptron is a randomized first-order algorithm for online multiclass classification.
We show expected mistake bounds with respect to the logistic loss, hinge loss, and the smooth hinge loss with constant regret.
We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses.
arXiv Detail & Related papers (2020-07-24T16:19:02Z) - Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies.
We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z) - A Multiclass Classification Approach to Label Ranking [2.6905021039717987]
In multiclass classification, the goal is to learn how to predict a random label $Y$, valued in $mathcalY=1,; ldots,; K $ with $Kgeq 3$.
This article is devoted to the analysis of this statistical learning problem, halfway between multiclass classification and posterior probability estimation.
arXiv Detail & Related papers (2020-02-21T17:12:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.