Related papers: Exploiting the Surrogate Gap in Online Multiclass Classification

Exploiting the Surrogate Gap in Online Multiclass Classification

URL: http://arxiv.org/abs/2007.12618v2
Date: Fri, 26 Feb 2021 11:51:23 GMT
Title: Exploiting the Surrogate Gap in Online Multiclass Classification
Authors: Dirk van der Hoeven
Abstract summary: Gaptron is a randomized first-order algorithm for online multiclass classification. We show expected mistake bounds with respect to the logistic loss, hinge loss, and the smooth hinge loss with constant regret. We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses.
Score: 13.452510519858995
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Gaptron, a randomized first-order algorithm for online multiclass classification. In the full information setting we show expected mistake bounds with respect to the logistic loss, hinge loss, and the smooth hinge loss with constant regret, where the expectation is with respect to the learner's randomness. In the bandit classification setting we show that Gaptron is the first linear time algorithm with $O(K\sqrt{T})$ expected regret, where $K$ is the number of classes. Additionally, the expected mistake bound of Gaptron does not depend on the dimension of the feature vector, contrary to previous algorithms with $O(K\sqrt{T})$ regret in the bandit classification setting. We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses rather than exploiting properties such as exp-concavity or mixability, which are traditionally used to prove logarithmic or constant regret bounds.

Related papers

Beyond Worst-Case Online Classification: VC-Based Regret Bounds for Relaxed Benchmarks [19.642496463491053]
We revisit online binary classification by shifting the focus from competing with the best-in-class binary loss to competing against relaxed benchmarks. We consider comparing predictors that are robust to small input perturbations, perform well under Gaussian smoothing, or maintain a prescribed output margin. Our algorithms achieve regret guarantees that depend only on the VC dimension and the complexity of the instance space.
arXiv Detail & Related papers (2025-04-14T18:00:23Z)
The Real Price of Bandit Information in Multiclass Classification [73.17969992976501]
We revisit the classical problem of multiclass classification with bandit feedback. We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
arXiv Detail & Related papers (2024-05-16T12:11:09Z)
Variance-Dependent Regret Bounds for Non-stationary Linear Bandits [52.872628573907434]
We propose algorithms that utilize the variance of the reward distribution as well as the $B_K$, and show that they can achieve tighter regret upper bounds. We introduce two novel algorithms: Restarted Weighted$textOFUL+$ and Restarted $textSAVE+$. Notably, when the total variance $V_K$ is much smaller than $K$, our algorithms outperform previous state-of-the-art results on non-stationary linear bandits under different settings.
arXiv Detail & Related papers (2024-03-15T23:36:55Z)
Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss [25.50155563108198]
We study online structured prediction with full-information feedback. We extend the exploit-the-surrogate-gap framework to online structured prediction with emphFenchel--Young losses.
arXiv Detail & Related papers (2024-02-13T02:36:41Z)
Variance-Aware Regret Bounds for Stochastic Contextual Dueling Bandits [53.281230333364505]
This paper studies the problem of contextual dueling bandits, where the binary comparison of dueling arms is generated from a generalized linear model (GLM) We propose a new SupLinUCB-type algorithm that enjoys computational efficiency and a variance-aware regret bound $tilde Obig(dsqrtsum_t=1Tsigma_t2 + dbig)$. Our regret bound naturally aligns with the intuitive expectation in scenarios where the comparison is deterministic, the algorithm only suffers from an $tilde O(d)$ regret.
arXiv Detail & Related papers (2023-10-02T08:15:52Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL) We focus on learning with general function classes and general model classes. We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z)
Beyond Bandit Feedback in Online Multiclass Classification [17.07011090727996]
We study the problem of online multiclass classification in a setting where the learner's feedback is determined by an arbitrary directed graph. We introduce Gappletron, the first online multiclass algorithm that works with arbitrary feedback graphs.
arXiv Detail & Related papers (2021-06-07T13:22:30Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
A Regret bound for Non-stationary Multi-Armed Bandits with Fairness Constraints [7.716156977428555]
We present a new algorithm called Fair Upper Confidence Bound with Exploration Fair-UCBe for solving a slowly varying $k$-armed bandit problem. We show that the performance of our algorithm in the non-stationary case approaches that of its stationary counterpart tends to zero.
arXiv Detail & Related papers (2020-12-24T18:12:01Z)
Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition [38.28925339231888]
We develop the first algorithm with a best-of-both-worlds'' guarantee. It achieves $mathcalO(log T)$ regret when the losses are adversarial. More generally, it achieves $tildemathcalO(sqrtC)$ regret in an intermediate setting.
arXiv Detail & Related papers (2020-06-10T01:59:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.