Related papers: Small Gradient Norm Regret for Online Convex Optimization

Small Gradient Norm Regret for Online Convex Optimization

URL: http://arxiv.org/abs/2601.13519v2
Date: Wed, 21 Jan 2026 23:05:14 GMT
Title: Small Gradient Norm Regret for Online Convex Optimization
Authors: Wenzhi Gao, Chang He, Madeleine Udell,
Abstract summary: We show that the $Gstar$ regret strictly refines the existing $Lstar$ (small loss) regret.<n>We extend our results to dynamic regret and bandit settings.
Score: 19.699405661554845
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: This paper introduces a new problem-dependent regret measure for online convex optimization with smooth losses. The notion, which we call the $G^\star$ regret, depends on the cumulative squared gradient norm evaluated at the decision in hindsight $\sum_{t=1}^T \|\nabla \ell(x^\star)\|^2$. We show that the $G^\star$ regret strictly refines the existing $L^\star$ (small loss) regret, and that it can be arbitrarily sharper when the losses have vanishing curvature around the hindsight decision. We establish upper and lower bounds on the $G^\star$ regret and extend our results to dynamic regret and bandit settings. As a byproduct, we refine the existing convergence analysis of stochastic optimization algorithms in the interpolation regime. Some experiments validate our theoretical findings.

Related papers

Swap Regret Minimization Through Response-Based Approachability [66.39400409563976]
We consider the problem of minimizing different notions of swap regret in online optimization.<n>We develop a significantly simpler, computationally efficient algorithm that guarantees $O(d3/2 sqrtT)$ linear swap regret for a general convex set and $O(d sqrtT)$ when the set is centrally symmetric.
arXiv Detail & Related papers (2026-02-05T23:43:25Z)
Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games [60.871651115241406]
A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods.<n>We propose a new scale-invariance and parameter-free variant of PRM$+$, which we call IREG-PRM$+$.<n>We show that it achieves $T-1/2$ best-iterate and $T-1$ optimal convergence guarantees, while also being on par with PRM$+$ on benchmark games.
arXiv Detail & Related papers (2025-10-06T00:33:20Z)
Exploiting Curvature in Online Convex Optimization with Delayed Feedback [6.390468088226495]
We study the online convex optimization problem with curved losses and delayed feedback.<n>We propose a variant of follow-the-regularized-leader that obtains regret of order $minsigma_maxln T, sqrtd_mathrmtot$.<n>We then consider exp-concave losses and extend the Online Newton Step algorithm to handle delays with an adaptive learning rate tuning.
arXiv Detail & Related papers (2025-06-09T09:49:54Z)
Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback [49.84060509296641]
We study online finite-horizon Markov Decision Processes with adversarially changing loss and aggregate bandit feedback (a.k.a full-bandit)<n>Under this type of feedback, the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step within the trajectory.<n>We introduce the first Policy Optimization algorithms for this setting.
arXiv Detail & Related papers (2025-02-06T12:03:24Z)
Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under a delay. We develop a novel algorithm, and prove that it enjoys a regret bound of $O(sqrtnT3/4+sqrtdT)$ in general. We show that the proposed algorithm can improve the regret bound to $O((nT)2/3log/3T+dlog T)$ for strongly convex functions.
arXiv Detail & Related papers (2024-02-14T13:08:26Z)
Near-Optimal Algorithms for Private Online Optimization in the Realizable Regime [74.52487417350221]
We consider online learning problems in the realizable setting, where there is a zero-loss solution. We propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds.
arXiv Detail & Related papers (2023-02-27T21:19:24Z)
Horizon-Free and Variance-Dependent Reinforcement Learning for Latent Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight. We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z)
Online Strongly Convex Optimization with Unknown Delays [30.931538196386672]
We investigate the problem of online convex optimization with unknown delays. We first extend the delayed variant of OGD for strongly convex functions. We establish a better regret bound of $O(dlog T)$, where $d$ is the maximum delay.
arXiv Detail & Related papers (2021-03-21T10:16:15Z)
Lazy OCO: Online Convex Optimization on a Switching Budget [34.936641201844054]
We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds. Similar problems have been addressed in prior work for the discrete decision set setting, and more recently in the continuous setting but only with an adaptive adversary.
arXiv Detail & Related papers (2021-02-07T14:47:19Z)
Projection-free Online Learning over Strongly Convex Sets [24.517908972536432]
We study the special case of online learning over strongly convex sets, for which we first prove that OFW can enjoy a better regret bound of $O(T2/3)$ for general convex losses. We show that it achieves a regret bound of $O(sqrtT)$ over general convex sets and a better regret bound of $O(sqrtT)$ over strongly convex sets.
arXiv Detail & Related papers (2020-10-16T05:42:50Z)
Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state. Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.