Related papers: The Real Price of Bandit Information in Multiclass Classification

The Real Price of Bandit Information in Multiclass Classification

URL: http://arxiv.org/abs/2405.10027v2
Date: Wed, 19 Jun 2024 09:20:04 GMT
Title: The Real Price of Bandit Information in Multiclass Classification
Authors: Liad Erez, Alon Cohen, Tomer Koren, Yishay Mansour, Shay Moran,
Abstract summary: We revisit the classical problem of multiclass classification with bandit feedback. We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
Score: 73.17969992976501
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We revisit the classical problem of multiclass classification with bandit feedback (Kakade, Shalev-Shwartz and Tewari, 2008), where each input classifies to one of $K$ possible labels and feedback is restricted to whether the predicted label is correct or not. Our primary inquiry is with regard to the dependency on the number of labels $K$, and whether $T$-step regret bounds in this setting can be improved beyond the $\smash{\sqrt{KT}}$ dependence exhibited by existing algorithms. Our main contribution is in showing that the minimax regret of bandit multiclass is in fact more nuanced, and is of the form $\smash{\widetilde{\Theta}\left(\min \left\{|H| + \sqrt{T}, \sqrt{KT \log |H|} \right\} \right) }$, where $H$ is the underlying (finite) hypothesis class. In particular, we present a new bandit classification algorithm that guarantees regret $\smash{\widetilde{O}(|H|+\sqrt{T})}$, improving over classical algorithms for moderately-sized hypothesis classes, and give a matching lower bound establishing tightness of the upper bounds (up to log-factors) in all parameter regimes.

Related papers

Enjoying Non-linearity in Multinomial Logistic Bandits [66.36004256710824]
We consider the multinomial logistic bandit problem, a variant of generalized linear bandits where a learner interacts with an environment.<n>We extend the definition of $kappa_*$ to the multinomial setting and propose an efficient algorithm that leverages the problem's non-linearity.<n>Our method yields a problem-dependent regret bound of order $ smashwidetildemathcalO( Kd sqrtT/kappa_*) $, where $K$ is the number of actions and $kappa_*
arXiv Detail & Related papers (2025-07-07T08:18:25Z)
Bandit Multiclass List Classification [28.483435983018616]
We study the problem of multiclass list classification with (semi-)bandit feedback, where input examples are mapped into subsets of size $m$ of a collection of $K$ possible labels. Our main result is for the $(varepsilon,delta)$-PAC variant of the problem for which we design an algorithm that returns an $varepsilon$-optimal hypothesis.
arXiv Detail & Related papers (2025-02-13T12:13:25Z)
Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z)
Communication-Constrained Bandits under Additive Gaussian Noise [111.06688156723018]
We study a distributed multi-armed bandit where a client supplies the learner with communication-constrained feedback. We propose a multi-phase bandit algorithm, $mathttUEtext-UCB++$, which matches this lower bound to a minor additive factor.
arXiv Detail & Related papers (2023-04-25T09:31:20Z)
The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL) We focus on learning with general function classes and general model classes. We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z)
Beyond Bandit Feedback in Online Multiclass Classification [17.07011090727996]
We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs.
arXiv Detail & Related papers (2021-06-07T13:22:30Z)
Impact of Representation Learning in Linear Bandits [83.17684841392754]
We study how representation learning can improve the efficiency of bandit problems. We present a new algorithm which achieves $widetildeO(TsqrtkN + sqrtdkNT)$ regret, where $N$ is the number of rounds we play for each bandit.
arXiv Detail & Related papers (2020-10-13T16:35:30Z)
Exploiting the Surrogate Gap in Online Multiclass Classification [13.452510519858995]
Gaptron is a randomized first-order algorithm for online multiclass classification. We show expected mistake bounds with respect to the logistic loss, hinge loss, and the smooth hinge loss with constant regret. We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses.
arXiv Detail & Related papers (2020-07-24T16:19:02Z)
Stochastic Bandits with Linear Constraints [69.757694218456]
We study a constrained contextual linear bandit setting, where the goal of the agent is to produce a sequence of policies. We propose an upper-confidence bound algorithm for this problem, called optimistic pessimistic linear bandit (OPLB)
arXiv Detail & Related papers (2020-06-17T22:32:19Z)
A Multiclass Classification Approach to Label Ranking [2.6905021039717987]
In multiclass classification, the goal is to learn how to predict a random label $Y$, valued in $mathcalY=1,; ldots,; K $ with $Kgeq 3$. This article is devoted to the analysis of this statistical learning problem, halfway between multiclass classification and posterior probability estimation.
arXiv Detail & Related papers (2020-02-21T17:12:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.