Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss
- URL: http://arxiv.org/abs/2402.08180v3
- Date: Tue, 22 Oct 2024 09:23:43 GMT
- Title: Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss
- Authors: Shinsaku Sakaue, Han Bao, Taira Tsuchiya, Taihei Oki,
- Abstract summary: We study online structured prediction with full-information feedback.
We extend the exploit-the-surrogate-gap framework to online structured prediction with emphFenchel--Young losses.
- Score: 25.50155563108198
- License:
- Abstract: This paper studies online structured prediction with full-information feedback. For online multiclass classification, Van der Hoeven (2020) established \emph{finite} surrogate regret bounds, which are independent of the time horizon, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of $O(\| \mathbf{U} \|_\mathrm{F}^2)$, where $\mathbf{U}$ is the best offline linear estimator and $\| \cdot \|_\mathrm{F}$ denotes the Frobenius norm. This bound is tight up to logarithmic factors and improves the previous bound of $O(d\| \mathbf{U} \|_\mathrm{F}^2)$ due to Van der Hoeven (2020) by a factor of $d$, the number of classes.
Related papers
- The Implicit Bias of Gradient Descent on Separable Multiclass Data [38.05903703331163]
We employ the framework of Permutation Equivariant and Relative Margin-based (PERM) losses to introduce a multiclass extension of the exponential tail property.
Our proof techniques closely mirror those of the binary case, thus illustrating the power of the PERM framework for bridging the binary-multiclass gap.
arXiv Detail & Related papers (2024-11-02T19:39:21Z) - The Real Price of Bandit Information in Multiclass Classification [73.17969992976501]
We revisit the classical problem of multiclass classification with bandit feedback.
We present a new bandit classification algorithm that guarantees regret $smashwidetildeO(|H|+sqrtT)$.
arXiv Detail & Related papers (2024-05-16T12:11:09Z) - A Universal Growth Rate for Learning with Smooth Surrogate Losses [30.389055604165222]
We prove a square-root growth rate near zero for smooth margin-based surrogate losses in binary classification.
We extend this analysis to multi-class classification with a series of novel results.
arXiv Detail & Related papers (2024-05-09T17:59:55Z) - Generalization Properties of Adversarial Training for $\ell_0$-Bounded
Adversarial Attacks [47.22918498465056]
In this paper, we aim to theoretically characterize the performance of adversarial training for an important class of neural networks.
Deriving a generalization in this setting has two main challenges.
arXiv Detail & Related papers (2024-02-05T22:57:33Z) - Unified Binary and Multiclass Margin-Based Classification [27.28814893730265]
We show that a broad range of multiclass loss functions, including many popular ones, can be expressed in the relative margin form.
We then analyze the class of Fenchel-Young losses, and expand the set of these losses that are known to be classification-calibrated.
arXiv Detail & Related papers (2023-11-29T16:24:32Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Neural Collapse Inspired Feature-Classifier Alignment for Few-Shot Class
Incremental Learning [120.53458753007851]
Few-shot class-incremental learning (FSCIL) has been a challenging problem as only a few training samples are accessible for each novel class in the new sessions.
We deal with this misalignment dilemma in FSCIL inspired by the recently discovered phenomenon named neural collapse.
We propose a neural collapse inspired framework for FSCIL. Experiments on the miniImageNet, CUB-200, and CIFAR-100 datasets demonstrate that our proposed framework outperforms the state-of-the-art performances.
arXiv Detail & Related papers (2023-02-06T18:39:40Z) - GistNet: a Geometric Structure Transfer Network for Long-Tailed
Recognition [95.93760490301395]
Long-tailed recognition is a problem where the number of examples per class is highly unbalanced.
GistNet is proposed to support this goal, using constellations of classifier parameters to encode the class geometry.
A new learning algorithm is then proposed for GeometrIc Structure Transfer (GIST), with resort to a combination of loss functions that combine class-balanced and random sampling to guarantee that, while overfitting to the popular classes is restricted to geometric parameters, it is leveraged to transfer class geometry from popular to few-shot classes.
arXiv Detail & Related papers (2021-05-01T00:37:42Z) - Agnostic Learning of Halfspaces with Gradient Descent via Soft Margins [92.7662890047311]
We show that gradient descent finds halfspaces with classification error $tilde O(mathsfOPT1/2) + varepsilon$ in $mathrmpoly(d,1/varepsilon)$ time and sample complexity.
arXiv Detail & Related papers (2020-10-01T16:48:33Z) - Exploiting the Surrogate Gap in Online Multiclass Classification [13.452510519858995]
Gaptron is a randomized first-order algorithm for online multiclass classification.
We show expected mistake bounds with respect to the logistic loss, hinge loss, and the smooth hinge loss with constant regret.
We present a new proof technique that exploits the gap between the zero-one loss and surrogate losses.
arXiv Detail & Related papers (2020-07-24T16:19:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.