Exploiting the Surrogate Gap in Online Multiclass Classification
- URL: http://arxiv.org/abs/2007.12618v2
- Date: Fri, 26 Feb 2021 11:51:23 GMT
- Title: Exploiting the Surrogate Gap in Online Multiclass Classification
- Authors: Dirk van der Hoeven
- Abstract summary: Gaptron is a randomized first-order algorithm for online multiclass classification.
We show expected mistake bounds with respect to the logistic loss, hinge loss, and the smooth hinge loss with constant regret.
We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses.
- Score: 13.452510519858995
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Gaptron, a randomized first-order algorithm for online multiclass
classification. In the full information setting we show expected mistake bounds
with respect to the logistic loss, hinge loss, and the smooth hinge loss with
constant regret, where the expectation is with respect to the learner's
randomness. In the bandit classification setting we show that Gaptron is the
first linear time algorithm with $O(K\sqrt{T})$ expected regret, where $K$ is
the number of classes. Additionally, the expected mistake bound of Gaptron does
not depend on the dimension of the feature vector, contrary to previous
algorithms with $O(K\sqrt{T})$ regret in the bandit classification setting. We
present a new proof technique that exploits the gap between the zero-one loss
and surrogate losses rather than exploiting properties such as exp-concavity or
mixability, which are traditionally used to prove logarithmic or constant
regret bounds.
Related papers
- The Real Price of Bandit Information in Multiclass Classification [73.17969992976501]
We revisit the classical problem of multiclass classification with bandit feedback.
We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
arXiv Detail & Related papers (2024-05-16T12:11:09Z) - Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds.
We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$.
Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z) - Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss [25.50155563108198]
We study online structured prediction with full-information feedback.
We extend the exploit-the-surrogate-gap framework to online structured prediction with emphFenchel--Young losses.
arXiv Detail & Related papers (2024-02-13T02:36:41Z) - Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM)
We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$.
Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z) - Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement
Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise.
Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret.
We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Beyond Bandit Feedback in Online Multiclass Classification [17.07011090727996]
We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph.
We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs.
arXiv Detail & Related papers (2021-06-07T13:22:30Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - A Regret bound for Non-stationary Multi-Armed Bandits with Fairness
Constraints [7.716156977428555]
We present a new algorithm called Fair Upper Confidence Bound with Exploration Fair-UCBe for solving a slowly varying $k$-armed bandit problem.
We show that the performance of our algorithm in the non-stationary case approaches that of its stationary counterpart tends to zero.
arXiv Detail & Related papers (2020-12-24T18:12:01Z) - Simultaneously Learning Stochastic and Adversarial Episodic MDPs with
Known Transition [38.28925339231888]
We develop the first algorithm with a best-of-both-worlds'' guarantee.
It achieves $mathcalO(log T)$ regret when the losses are adversarial.
More generally, it achieves $tildemathcalO(sqrtC)$ regret in an intermediate setting.
arXiv Detail & Related papers (2020-06-10T01:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.