Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality
- URL: http://arxiv.org/abs/2511.06597v1
- Date: Mon, 10 Nov 2025 01:07:51 GMT
- Title: Optimistic Online-to-Batch Conversions for Accelerated Convergence and Universality
- Authors: Yu-Hu Yan, Peng Zhao, Zhi-Hua Zhou,
- Abstract summary: We study offline convex optimization with smooth objectives, where the classical Nesterov's Accelerated Gradient method achieves the optimal convergence.<n>We propose novel optimistic online-to-batch conversions that incorporate optimism theoretically into the analysis.<n>We achieve the optimal accelerated convergence rate for strongly convex and smooth objectives for the first time through the lens of online-to-batch conversion.
- Score: 61.14065222161553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we study offline convex optimization with smooth objectives, where the classical Nesterov's Accelerated Gradient (NAG) method achieves the optimal accelerated convergence. Extensive research has aimed to understand NAG from various perspectives, and a recent line of work approaches this from the viewpoint of online learning and online-to-batch conversion, emphasizing the role of optimistic online algorithms for acceleration. In this work, we contribute to this perspective by proposing novel optimistic online-to-batch conversions that incorporate optimism theoretically into the analysis, thereby significantly simplifying the online algorithm design while preserving the optimal convergence rates. Specifically, we demonstrate the effectiveness of our conversions through the following results: (i) when combined with simple online gradient descent, our optimistic conversion achieves the optimal accelerated convergence; (ii) our conversion also applies to strongly convex objectives, and by leveraging both optimistic online-to-batch conversion and optimistic online algorithms, we achieve the optimal accelerated convergence rate for strongly convex and smooth objectives, for the first time through the lens of online-to-batch conversion; (iii) our optimistic conversion can achieve universality to smoothness -- applicable to both smooth and non-smooth objectives without requiring knowledge of the smoothness coefficient -- and remains efficient as non-universal methods by using only one gradient query in each iteration. Finally, we highlight the effectiveness of our optimistic online-to-batch conversions by a precise correspondence with NAG.
Related papers
- Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization [66.08522228989634]
We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV)<n>This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD)
arXiv Detail & Related papers (2025-11-18T17:51:17Z) - Gradient-Variation Online Adaptivity for Accelerated Optimization with Hölder Smoothness [31.293577718154225]
We investigate online learning with H"older smooth functions, a general class encompassing both smooth and non-smooth functions.<n>We design the corresponding gradient-variation online learning algorithm whose regret smoothly interpolates between the optimal guarantees in smooth and non-smooth regimes.<n>We address this by integrating online adaptivity with a detection-based guess-and-check procedure, which, for the first time, yields a universal offline method.
arXiv Detail & Related papers (2025-11-04T05:27:57Z) - General framework for online-to-nonconvex conversion: Schedule-free SGD is also effective for nonconvex optimization [40.254487017289975]
This work investigates the effectiveness schedule-free methods, developed by A. Defazio et al.
Specifically, we show that schedule-free iteration for nonsmooth SGD non optimization problems.
arXiv Detail & Related papers (2024-11-11T15:25:48Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - LLM as a Complementary Optimizer to Gradient Descent: A Case Study in Prompt Tuning [69.95292905263393]
We show that gradient-based and high-level LLMs can effectively collaborate a combined optimization framework.<n>In this paper, we show that these complementary to each other and can effectively collaborate a combined optimization framework.
arXiv Detail & Related papers (2024-05-30T06:24:14Z) - An Automatic Learning Rate Schedule Algorithm for Achieving Faster
Convergence and Steeper Descent [10.061799286306163]
We investigate the convergence behavior of the delta-bar-delta algorithm in real-world neural network optimization.
To address any potential convergence challenges, we propose a novel approach called RDBD (Regrettable Delta-Bar-Delta)
Our approach allows for prompt correction of biased learning rate adjustments and ensures the convergence of the optimization process.
arXiv Detail & Related papers (2023-10-17T14:15:57Z) - Efficient Federated Learning via Local Adaptive Amended Optimizer with
Linear Speedup [90.26270347459915]
We propose a novel momentum-based algorithm via utilizing the global descent locally adaptive.
textitLADA could greatly reduce the communication rounds and achieves higher accuracy than several baselines.
arXiv Detail & Related papers (2023-07-30T14:53:21Z) - Nesterov Meets Optimism: Rate-Optimal Separable Minimax Optimization [108.35402316802765]
We propose a new first-order optimization algorithm -- AcceleratedGradient-OptimisticGradient (AG-OG) Ascent.
We show that AG-OG achieves the optimal convergence rate (up to a constant) for a variety of settings.
We extend our algorithm to extend the setting and achieve the optimal convergence rate in both bi-SC-SC and bi-C-SC settings.
arXiv Detail & Related papers (2022-10-31T17:59:29Z) - Accelerated Federated Learning with Decoupled Adaptive Optimization [53.230515878096426]
federated learning (FL) framework enables clients to collaboratively learn a shared model while keeping privacy of training data on clients.
Recently, many iterations efforts have been made to generalize centralized adaptive optimization methods, such as SGDM, Adam, AdaGrad, etc., to federated settings.
This work aims to develop novel adaptive optimization methods for FL from the perspective of dynamics of ordinary differential equations (ODEs)
arXiv Detail & Related papers (2022-07-14T22:46:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.