Related papers: Glocal Smoothness: Line Search can really help!

Glocal Smoothness: Line Search can really help!

URL: http://arxiv.org/abs/2506.12648v1
Date: Sat, 14 Jun 2025 22:18:53 GMT
Title: Glocal Smoothness: Line Search can really help!
Authors: Curtis Fox, Aaron Mishkin, Sharan Vaswani, Mark Schmidt,
Abstract summary: Iteration complexities for first-order optimization algorithms are typically stated in terms of a global Lipschitz constant of the gradient.<n>Many objective functions that arise in practice have regions with small Lipschitz constants where larger step sizes can be used.<n>We show that glocal smoothness can lead to improved complexities for the Polyak and AdGD step sizes.
Score: 13.442002218246037
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Iteration complexities for first-order optimization algorithms are typically stated in terms of a global Lipschitz constant of the gradient, and near-optimal results are achieved using fixed step sizes. But many objective functions that arise in practice have regions with small Lipschitz constants where larger step sizes can be used. Many local Lipschitz assumptions have been proposed, which have lead to results showing that adaptive step sizes and/or line searches yield improved convergence rates over fixed step sizes. However, these faster rates tend to depend on the iterates of the algorithm, which makes it difficult to compare the iteration complexities of different methods. We consider a simple characterization of global and local ("glocal") smoothness that only depends on properties of the function. This allows upper bounds on iteration complexities in terms of iterate-independent constants and enables us to compare iteration complexities between algorithms. Under this assumption it is straightforward to show the advantages of line searches over fixed step sizes, and that in some settings, gradient descent with line search has a better iteration complexity than accelerated methods with fixed step sizes. We further show that glocal smoothness can lead to improved complexities for the Polyak and AdGD step sizes, as well other algorithms including coordinate optimization, stochastic gradient methods, accelerated gradient methods, and non-linear conjugate gradient methods.

Related papers

Nesterov Finds GRAAL: Optimal and Adaptive Gradient Method for Convex Optimization [17.227158587717934]
adaptive gradient methods, including GRAAL (Malitsky, 2020), have been developed.<n>We prove that our algorithm achieves the desired optimal convergence rate for Lipschitz smooth functions.<n>In contrast to existing methods, it does so with an arbitrary, even excessively small, initial stepsize at the cost of a logarithmic additive term in the complexity.
arXiv Detail & Related papers (2025-07-13T23:07:45Z)
Efficient line search for optimizing Area Under the ROC Curve in gradient descent [2.094821665776961]
We study the piecewise linear/constant nature of the Area Under Min (AUM) of false positive and false negative rates. We propose new efficient path-following algorithms for choosing the learning rate which is optimal for each step of descent.
arXiv Detail & Related papers (2024-10-11T08:59:06Z)
Optimization on a Finer Scale: Bounded Local Subgradient Variation Perspective [17.5796206457693]
We study nonsmooth optimization problems under bounded local subgradient variation. The resulting class of objective encapsulates the classes of objective based on the defined classes.
arXiv Detail & Related papers (2024-03-24T22:42:40Z)
ELRA: Exponential learning rate adaption gradient descent optimization method [83.88591755871734]
We present a novel, fast (exponential rate), ab initio (hyper-free) gradient based adaption. The main idea of the method is to adapt the $alpha by situational awareness. It can be applied to problems of any dimensions n and scales only linearly.
arXiv Detail & Related papers (2023-09-12T14:36:13Z)
Accelerated First-Order Optimization under Nonlinear Constraints [61.98523595657983]
We exploit between first-order algorithms for constrained optimization and non-smooth systems to design a new class of accelerated first-order algorithms.<n>An important property of these algorithms is that constraints are expressed in terms of velocities instead of sparse variables.
arXiv Detail & Related papers (2023-02-01T08:50:48Z)
Convergence of ease-controlled Random Reshuffling gradient Algorithms under Lipschitz smoothness [0.0]
We consider the average of a very large number of smooth possibly non-size functions, and we use two widely minibatch frameworks to tackle this problem. We define ease-controlled modifications of IG/RR schemes, which require a light additional computational effort. We prove our implementation with both a full batch gradient (i.e. L-BFGS) and an implementation of IG/RR methods, proving that algorithms require a similar computational effort.
arXiv Detail & Related papers (2022-12-04T15:26:36Z)
Big-Step-Little-Step: Efficient Gradient Methods for Objectives with Multiple Scales [45.994415581007054]
We consider the problem of minimizing a function $f : mathbbRd rightarrow mathbbR$ which is implicitly decomposable as the sum of $m$ unknown non-interacting smooth, strongly convex functions. We provide new gradient-based methods for efficiently solving a broad class of ill-conditioned optimization problems.
arXiv Detail & Related papers (2021-11-04T20:09:12Z)
Hybrid Trilinear and Bilinear Programming for Aligning Partially Overlapping Point Sets [85.71360365315128]
In many applications, we need algorithms which can align partially overlapping point sets are invariant to the corresponding corresponding RPM algorithm. We first show that the objective is a cubic bound function. We then utilize the convex envelopes of trilinear and bilinear monomial transformations to derive its lower bound. We next develop a branch-and-bound (BnB) algorithm which only branches over the transformation variables and runs efficiently.
arXiv Detail & Related papers (2021-01-19T04:24:23Z)
Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step. Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z)
Functions with average smoothness: structure, algorithms, and learning [12.362670630646804]
We define a local slope at each point and gauge the function complexity as the average of these values. Since the mean can be dramatically smaller than the maximum, this complexity measure can yield considerably sharper generalization bounds. We discover a surprisingly rich and analytic structure in the function class we define.
arXiv Detail & Related papers (2020-07-13T10:06:58Z)
Gradient Free Minimax Optimization: Variance Reduction and Faster Convergence [120.9336529957224]
In this paper, we denote the non-strongly setting on the magnitude of a gradient-free minimax optimization problem. We show that a novel zeroth-order variance reduced descent algorithm achieves the best known query complexity.
arXiv Detail & Related papers (2020-06-16T17:55:46Z)
Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits [99.70167985955352]
We study the problem of zero-order optimization of a strongly convex function. We consider a randomized approximation of the projected gradient descent algorithm. Our results imply that the zero-order algorithm is nearly optimal in terms of sample complexity and the problem parameters.
arXiv Detail & Related papers (2020-06-14T10:42:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.