A Study of Condition Numbers for First-Order Optimization
- URL: http://arxiv.org/abs/2012.05782v2
- Date: Fri, 25 Dec 2020 17:50:46 GMT
- Title: A Study of Condition Numbers for First-Order Optimization
- Authors: Charles Guille-Escuret and Baptiste Goujaud and Manuela Girotti and
Ioannis Mitliagkas
- Abstract summary: We introduce a class of perturbations quantified via a new norm, called *-norm.
We show that smoothness and strong convexity can be heavily impacted by arbitrarily small perturbations.
We propose a notion of continuity of the metrics, which is essential for a robust tuning strategy.
- Score: 12.072067586666382
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The study of first-order optimization algorithms (FOA) typically starts with
assumptions on the objective functions, most commonly smoothness and strong
convexity. These metrics are used to tune the hyperparameters of FOA. We
introduce a class of perturbations quantified via a new norm, called *-norm. We
show that adding a small perturbation to the objective function has an
equivalently small impact on the behavior of any FOA, which suggests that it
should have a minor impact on the tuning of the algorithm. However, we show
that smoothness and strong convexity can be heavily impacted by arbitrarily
small perturbations, leading to excessively conservative tunings and
convergence issues. In view of these observations, we propose a notion of
continuity of the metrics, which is essential for a robust tuning strategy.
Since smoothness and strong convexity are not continuous, we propose a
comprehensive study of existing alternative metrics which we prove to be
continuous. We describe their mutual relations and provide their guaranteed
convergence rates for the Gradient Descent algorithm accordingly tuned. Finally
we discuss how our work impacts the theoretical understanding of FOA and their
performances.
Related papers
- Error Feedback under $(L_0,L_1)$-Smoothness: Normalization and Momentum [56.37522020675243]
We provide the first proof of convergence for normalized error feedback algorithms across a wide range of machine learning problems.
We show that due to their larger allowable stepsizes, our new normalized error feedback algorithms outperform their non-normalized counterparts on various tasks.
arXiv Detail & Related papers (2024-10-22T10:19:27Z) - Improving Kernel-Based Nonasymptotic Simultaneous Confidence Bands [0.0]
The paper studies the problem of constructing nonparametric simultaneous confidence bands with nonasymptotic and distribition-free guarantees.
The approach is based on the theory of Paley-Wiener kernel reproducing Hilbert spaces.
arXiv Detail & Related papers (2024-01-28T22:43:33Z) - Generalization Error Bounds for Noisy, Iterative Algorithms via Maximal
Leakage [24.40306100502023]
We adopt an information-theoretic framework to analyze the generalization behavior of a class of noisy learning algorithms.
We show how the assumptions on the update function affect the optimal choice of the noise.
arXiv Detail & Related papers (2023-02-28T12:13:57Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Large deviations rates for stochastic gradient descent with strongly
convex functions [11.247580943940916]
We provide a formal framework for the study of general high probability bounds with gradient descent.
We find an upper large deviations bound for SGD with strongly convex functions.
arXiv Detail & Related papers (2022-11-02T09:15:26Z) - STORM+: Fully Adaptive SGD with Momentum for Nonconvex Optimization [74.1615979057429]
We investigate non-batch optimization problems where the objective is an expectation over smooth loss functions.
Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
arXiv Detail & Related papers (2021-11-01T15:43:36Z) - High Probability Complexity Bounds for Non-Smooth Stochastic Optimization with Heavy-Tailed Noise [51.31435087414348]
It is essential to theoretically guarantee that algorithms provide small objective residual with high probability.
Existing methods for non-smooth convex optimization have complexity bounds with dependence on confidence level.
We propose novel stepsize rules for two methods with gradient clipping.
arXiv Detail & Related papers (2021-06-10T17:54:21Z) - Noisy Gradient Descent Converges to Flat Minima for Nonconvex Matrix
Factorization [36.182992409810446]
This paper investigates the importance of noise in non optimization problems.
We show that gradient descent can converge to any global form that converges to a global bias that is determined by the injected noise.
arXiv Detail & Related papers (2021-02-24T17:50:17Z) - Linear Last-iterate Convergence in Constrained Saddle-point Optimization [48.44657553192801]
We significantly expand the understanding of last-rate uniqueness for Optimistic Gradient Descent Ascent (OGDA) and Optimistic Multiplicative Weights Update (OMWU)
We show that when the equilibrium is unique, linear lastiterate convergence is achieved with a learning rate whose value is set to a universal constant.
We show that bilinear games over any polytope satisfy this condition and OGDA converges exponentially fast even without the unique equilibrium assumption.
arXiv Detail & Related papers (2020-06-16T20:53:04Z) - Convergence of adaptive algorithms for weakly convex constrained
optimization [59.36386973876765]
We prove the $mathcaltilde O(t-1/4)$ rate of convergence for the norm of the gradient of Moreau envelope.
Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly smooth optimization domains.
arXiv Detail & Related papers (2020-06-11T17:43:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.