Small Gradient Norm Regret for Online Convex Optimization
- URL: http://arxiv.org/abs/2601.13519v2
- Date: Wed, 21 Jan 2026 23:05:14 GMT
- Title: Small Gradient Norm Regret for Online Convex Optimization
- Authors: Wenzhi Gao, Chang He, Madeleine Udell,
- Abstract summary: We show that the $Gstar$ regret strictly refines the existing $Lstar$ (small loss) regret.<n>We extend our results to dynamic regret and bandit settings.
- Score: 19.699405661554845
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: This paper introduces a new problem-dependent regret measure for online convex optimization with smooth losses. The notion, which we call the $G^\star$ regret, depends on the cumulative squared gradient norm evaluated at the decision in hindsight $\sum_{t=1}^T \|\nabla \ell(x^\star)\|^2$. We show that the $G^\star$ regret strictly refines the existing $L^\star$ (small loss) regret, and that it can be arbitrarily sharper when the losses have vanishing curvature around the hindsight decision. We establish upper and lower bounds on the $G^\star$ regret and extend our results to dynamic regret and bandit settings. As a byproduct, we refine the existing convergence analysis of stochastic optimization algorithms in the interpolation regime. Some experiments validate our theoretical findings.
Related papers
- Swap Regret Minimization Through Response-Based Approachability [66.39400409563976]
We consider the problem of minimizing different notions of swap regret in online optimization.<n>We develop a significantly simpler, computationally efficient algorithm that guarantees $O(d3/2 sqrtT)$ linear swap regret for a general convex set and $O(d sqrtT)$ when the set is centrally symmetric.
arXiv Detail & Related papers (2026-02-05T23:43:25Z) - Scale-Invariant Regret Matching and Online Learning with Optimal Convergence: Bridging Theory and Practice in Zero-Sum Games [60.871651115241406]
A considerable chasm has been looming for decades between theory and practice in zero-sum game solving through first-order methods.<n>We propose a new scale-invariance and parameter-free variant of PRM$+$, which we call IREG-PRM$+$.<n>We show that it achieves $T-1/2$ best-iterate and $T-1$ optimal convergence guarantees, while also being on par with PRM$+$ on benchmark games.
arXiv Detail & Related papers (2025-10-06T00:33:20Z) - Exploiting Curvature in Online Convex Optimization with Delayed Feedback [6.390468088226495]
We study the online convex optimization problem with curved losses and delayed feedback.<n>We propose a variant of follow-the-regularized-leader that obtains regret of order $minsigma_maxln T, sqrtd_mathrmtot$.<n>We then consider exp-concave losses and extend the Online Newton Step algorithm to handle delays with an adaptive learning rate tuning.
arXiv Detail & Related papers (2025-06-09T09:49:54Z) - Near-optimal Regret Using Policy Optimization in Online MDPs with Aggregate Bandit Feedback [49.84060509296641]
We study online finite-horizon Markov Decision Processes with adversarially changing loss and aggregate bandit feedback (a.k.a full-bandit)<n>Under this type of feedback, the agent observes only the total loss incurred over the entire trajectory, rather than the individual losses at each intermediate step within the trajectory.<n>We introduce the first Policy Optimization algorithms for this setting.
arXiv Detail & Related papers (2025-02-06T12:03:24Z) - Improved Regret for Bandit Convex Optimization with Delayed Feedback [50.46856739179311]
bandit convex optimization (BCO) with delayed feedback, where only the loss value of the action is revealed under a delay.
We develop a novel algorithm, and prove that it enjoys a regret bound of $O(sqrtnT3/4+sqrtdT)$ in general.
We show that the proposed algorithm can improve the regret bound to $O((nT)2/3log/3T+dlog T)$ for strongly convex functions.
arXiv Detail & Related papers (2024-02-14T13:08:26Z) - Near-Optimal Algorithms for Private Online Optimization in the
Realizable Regime [74.52487417350221]
We consider online learning problems in the realizable setting, where there is a zero-loss solution.
We propose new Differentially Private (DP) algorithms that obtain near-optimal regret bounds.
arXiv Detail & Related papers (2023-02-27T21:19:24Z) - Horizon-Free and Variance-Dependent Reinforcement Learning for Latent
Markov Decision Processes [62.90204655228324]
We study regret minimization for reinforcement learning (RL) in Latent Markov Decision Processes (LMDPs) with context in hindsight.
We design a novel model-based algorithmic framework which can be instantiated with both a model-optimistic and a value-optimistic solver.
arXiv Detail & Related papers (2022-10-20T21:32:01Z) - Online Strongly Convex Optimization with Unknown Delays [30.931538196386672]
We investigate the problem of online convex optimization with unknown delays.
We first extend the delayed variant of OGD for strongly convex functions.
We establish a better regret bound of $O(dlog T)$, where $d$ is the maximum delay.
arXiv Detail & Related papers (2021-03-21T10:16:15Z) - Lazy OCO: Online Convex Optimization on a Switching Budget [34.936641201844054]
We study a variant of online convex optimization where the player is permitted to switch decisions at most $S$ times in expectation throughout $T$ rounds.
Similar problems have been addressed in prior work for the discrete decision set setting, and more recently in the continuous setting but only with an adaptive adversary.
arXiv Detail & Related papers (2021-02-07T14:47:19Z) - Projection-free Online Learning over Strongly Convex Sets [24.517908972536432]
We study the special case of online learning over strongly convex sets, for which we first prove that OFW can enjoy a better regret bound of $O(T2/3)$ for general convex losses.
We show that it achieves a regret bound of $O(sqrtT)$ over general convex sets and a better regret bound of $O(sqrtT)$ over strongly convex sets.
arXiv Detail & Related papers (2020-10-16T05:42:50Z) - Naive Exploration is Optimal for Online LQR [49.681825576239355]
We show that the optimal regret scales as $widetildeTheta(sqrtd_mathbfu2 d_mathbfx T)$, where $T$ is the number of time steps, $d_mathbfu$ is the dimension of the input space, and $d_mathbfx$ is the dimension of the system state.
Our lower bounds rule out the possibility of a $mathrmpoly(logT)$-regret algorithm, which had been
arXiv Detail & Related papers (2020-01-27T03:44:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.