A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
- URL: http://arxiv.org/abs/2511.20584v1
- Date: Tue, 25 Nov 2025 18:13:53 GMT
- Title: A Tale of Two Geometries: Adaptive Optimizers and Non-Euclidean Descent
- Authors: Shuo Xie, Tianhao Wang, Beining Wu, Zhiyuan Li,
- Abstract summary: Adaptive adaptings can reduce to normalized steepest descent (NSD) when only gradient to the current is used.<n>In the convex setting adaptives are governed by a stronger adaptive smoothness condition.<n>We show that adaptive smoothness enables acceleration of adaptives with Nesterov's convex setting.
- Score: 12.958474356193427
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Adaptive optimizers can reduce to normalized steepest descent (NSD) when only adapting to the current gradient, suggesting a close connection between the two algorithmic families. A key distinction between their analyses, however, lies in the geometries, e.g., smoothness notions, they rely on. In the convex setting, adaptive optimizers are governed by a stronger adaptive smoothness condition, while NSD relies on the standard notion of smoothness. We extend the theory of adaptive smoothness to the nonconvex setting and show that it precisely characterizes the convergence of adaptive optimizers. Moreover, we establish that adaptive smoothness enables acceleration of adaptive optimizers with Nesterov momentum in the convex setting, a guarantee unattainable under standard smoothness for certain non-Euclidean geometry. We further develop an analogous comparison for stochastic optimization by introducing adaptive gradient variance, which parallels adaptive smoothness and leads to dimension-free convergence guarantees that cannot be achieved under standard gradient variance for certain non-Euclidean geometry.
Related papers
- Adaptive Conditional Gradient Descent [0.5524804393257919]
This paper presents a novel adaptive step-size strategy for optimization.<n>We show that Conditional or non-Euclidean Normalized Steepest Descent algorithms exhibit competitive performance.
arXiv Detail & Related papers (2025-10-13T14:11:10Z) - Mirror Descent Under Generalized Smoothness [23.5387392871236]
We introduce a new $ell*$-smoothness concept that measures the norm of Hessians in terms of a general norm and its dual.<n>We establish convergence for mirror-descent-type algorithms, matching the rates under the classic smoothness.
arXiv Detail & Related papers (2025-02-02T11:23:10Z) - Adaptive Gradient Normalization and Independent Sampling for (Stochastic) Generalized-Smooth Optimization [23.962901840695462]
We show that existing algorithms are not fully adapted to generalized nonsmooth geometry.<n>Experiments show the fast convergence of sampling problems with our algorithm.
arXiv Detail & Related papers (2024-10-17T21:52:00Z) - Gradient-Variation Online Learning under Generalized Smoothness [56.38427425920781]
gradient-variation online learning aims to achieve regret guarantees that scale with variations in gradients of online functions.
Recent efforts in neural network optimization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms.
We provide the applications for fast-rate convergence in games and extended adversarial optimization.
arXiv Detail & Related papers (2024-08-17T02:22:08Z) - SUPER-ADAM: Faster and Universal Framework of Adaptive Gradients [99.13839450032408]
It is desired to design a universal framework for adaptive algorithms to solve general problems.
In particular, our novel framework provides adaptive methods under non convergence support for setting.
arXiv Detail & Related papers (2021-06-15T15:16:28Z) - Adaptive Gradient Methods for Constrained Convex Optimization and
Variational Inequalities [32.51470158863247]
Main algorithms AdaACSA and AdaAGD+ are accelerated methods for constrained convex optimization.
We complement them with a simpler algorithm AdaGrad+ which enjoys the same features, and achieves the standard non-accelerated convergence rate.
arXiv Detail & Related papers (2020-07-17T09:10:21Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z) - Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization [71.03797261151605]
Adaptivity is an important yet under-studied property in modern optimization theory.
Our algorithm is proved to achieve the best-available convergence for non-PL objectives simultaneously while outperforming existing algorithms for PL objectives.
arXiv Detail & Related papers (2020-02-13T05:42:27Z) - GradientDICE: Rethinking Generalized Offline Estimation of Stationary
Values [75.17074235764757]
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution.
GenDICE is the state-of-the-art for estimating such density ratios.
arXiv Detail & Related papers (2020-01-29T22:10:11Z) - On the Convergence of Adaptive Gradient Methods for Nonconvex Optimization [80.03647903934723]
We prove adaptive gradient methods in expectation of gradient convergence methods.
Our analyses shed light on better adaptive gradient methods in optimizing non understanding gradient bounds.
arXiv Detail & Related papers (2018-08-16T20:25:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.