Non-Stationary Online Structured Prediction with Surrogate Losses
- URL: http://arxiv.org/abs/2510.07086v1
- Date: Wed, 08 Oct 2025 14:43:44 GMT
- Title: Non-Stationary Online Structured Prediction with Surrogate Losses
- Authors: Shinsaku Sakaue, Han Bao, Yuzhou Cao,
- Abstract summary: We prove a bound of the form $F_T + C (1 + P_T)$ on the cumulative target loss.<n>Our core idea is to synthesize the dynamic regret bound of the gradient online descent (OGD) with the technique of exploiting the surrogate gap.<n>Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance.
- Score: 22.848052937383244
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online structured prediction, including online classification as a special case, is the task of sequentially predicting labels from input features. Therein the surrogate regret -- the cumulative excess of the target loss (e.g., 0-1 loss) over the surrogate loss (e.g., logistic loss) of the fixed best estimator -- has gained attention, particularly because it often admits a finite bound independent of the time horizon $T$. However, such guarantees break down in non-stationary environments, where every fixed estimator may incur the surrogate loss growing linearly with $T$. We address this by proving a bound of the form $F_T + C(1 + P_T)$ on the cumulative target loss, where $F_T$ is the cumulative surrogate loss of any comparator sequence, $P_T$ is its path length, and $C > 0$ is some constant. This bound depends on $T$ only through $F_T$ and $P_T$, often yielding much stronger guarantees in non-stationary environments. Our core idea is to synthesize the dynamic regret bound of the online gradient descent (OGD) with the technique of exploiting the surrogate gap. Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance. We further extend our approach to a broader class of problems via the convolutional Fenchel--Young loss. Finally, we prove a lower bound showing that the dependence on $F_T$ and $P_T$ is tight.
Related papers
- Fundamental Novel Consistency Theory: $H$-Consistency Bounds [19.493449206135296]
In machine learning, the loss functions optimized during training often differ from the target loss that defines task performance.<n>We present an in-depth study of the target loss estimation error relative to the surrogate loss estimation error.<n>Our analysis leads to $H$-consistency bounds, which are guarantees accounting for the hypothesis set $H$.
arXiv Detail & Related papers (2025-12-28T11:02:20Z) - Multiclass Loss Geometry Matters for Generalization of Gradient Descent in Separable Classification [27.33243506775655]
We study generalization performance of unregularized methods for separable linear classification.<n>We show that convergence rates are crucially influenced by the geometry of the loss template.
arXiv Detail & Related papers (2025-05-28T13:39:14Z) - Any-stepsize Gradient Descent for Separable Data under Fenchel--Young Losses [17.835960292396255]
We show arbitrary-stepsize gradient convergence for a general loss function based on the framework of emphFenchel--Young losses.<n>We argue that these better rate is possible because of emphseparation margin of loss functions, instead of the self-bounding property.
arXiv Detail & Related papers (2025-02-07T12:52:12Z) - Revisiting Projection-Free Online Learning with Time-Varying Constraints [35.573654458435854]
We investigate constrained online convex optimization, in which decisions must belong to a fixed and typically complicated domain.<n>Several projection-free methods have been proposed with an $mathcalO(T3/4 sqrtlog T)$ regret bound and an $mathcalO(T3/4 sqrtlog T)$ cumulative constraint violation (CCV) bound for general convex losses.<n>In this paper, we improve this result and further establish textitnovel regret and CCV bounds when loss functions are strongly convex
arXiv Detail & Related papers (2025-01-27T13:38:51Z) - An Optimistic Algorithm for Online Convex Optimization with Adversarial Constraints [55.2480439325792]
We study Online Convex Optimization (OCO) with adversarial constraints.<n>We focus on a setting where the algorithm has access to predictions of the loss and constraint functions.<n>Our results show that we can improve the current best bounds of $ O(sqrtT) $ regret and $ tildeO(sqrtT) $ cumulative constraint violations.
arXiv Detail & Related papers (2024-12-11T03:06:42Z) - Convergence Rate Analysis of LION [54.28350823319057]
LION converges iterations of $cal(sqrtdK-)$ measured by gradient Karush-Kuhn-T (sqrtdK-)$.
We show that LION can achieve lower loss and higher performance compared to standard SGD.
arXiv Detail & Related papers (2024-11-12T11:30:53Z) - Realizable $H$-Consistent and Bayes-Consistent Loss Functions for Learning to Defer [30.389055604165222]
We introduce a broad family of surrogate losses, parameterized by a non-increasing function $Psi$, and establish their realizable $H$-consistency under mild conditions.
For cost functions based on classification error, we show that these losses admit $H$-consistency bounds when the hypothesis set is symmetric and complete.
arXiv Detail & Related papers (2024-07-18T17:35:03Z) - Universal Online Learning with Gradient Variations: A Multi-layer Online Ensemble Approach [57.92727189589498]
We propose an online convex optimization approach with two different levels of adaptivity.
We obtain $mathcalO(log V_T)$, $mathcalO(d log V_T)$ and $hatmathcalO(sqrtV_T)$ regret bounds for strongly convex, exp-concave and convex loss functions.
arXiv Detail & Related papers (2023-07-17T09:55:35Z) - Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
This paper investigates the delayed online convex optimization (OCO) in non-stationary environments.
We first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order.
We develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(sqrtbardT(P_T+1))$.
arXiv Detail & Related papers (2023-05-20T07:54:07Z) - The Adversarial Consistency of Surrogate Risks for Binary Classification [20.03511985572199]
adversarial training seeks to minimize the expected $0$-$1$ loss when each example can be maliciously corrupted within a small ball.
We give a simple and complete characterization of the set of surrogate loss functions that are consistent.
Our results reveal that the class of adversarially consistent surrogates is substantially smaller than in the standard setting.
arXiv Detail & Related papers (2023-05-17T05:27:40Z) - Label Distributionally Robust Losses for Multi-class Classification:
Consistency, Robustness and Adaptivity [55.29408396918968]
We study a family of loss functions named label-distributionally robust (LDR) losses for multi-class classification.
Our contributions include both consistency and robustness by establishing top-$k$ consistency of LDR losses for multi-class classification.
We propose a new adaptive LDR loss that automatically adapts the individualized temperature parameter to the noise degree of class label of each instance.
arXiv Detail & Related papers (2021-12-30T00:27:30Z) - Dynamic Regret of Convex and Smooth Functions [93.71361250701075]
We investigate online convex optimization in non-stationary environments.
We choose the dynamic regret as the performance measure.
We show that it is possible to further enhance the dynamic regret by exploiting the smoothness condition.
arXiv Detail & Related papers (2020-07-07T14:10:57Z) - On the Almost Sure Convergence of Stochastic Gradient Descent in
Non-Convex Problems [75.58134963501094]
This paper analyzes the trajectories of gradient descent (SGD)
We show that SGD avoids saddle points/manifolds with $1$ for strict step-size policies.
arXiv Detail & Related papers (2020-06-19T14:11:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.