A Local Convergence Theory for the Stochastic Gradient Descent Method in
Non-Convex Optimization With Non-isolated Local Minima
- URL: http://arxiv.org/abs/2203.10973v2
- Date: Thu, 24 Mar 2022 13:51:57 GMT
- Title: A Local Convergence Theory for the Stochastic Gradient Descent Method in
Non-Convex Optimization With Non-isolated Local Minima
- Authors: Taehee Ko and Xiantao Li
- Abstract summary: Non-isolated minima presents a unique challenge that has remained under-explored.
In this paper, we study the local convergence of the gradient descent method to non-isolated global minima.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-convex loss functions arise frequently in modern machine learning, and
for the theoretical analysis of stochastic optimization methods, the presence
of non-isolated minima presents a unique challenge that has remained
under-explored. In this paper, we study the local convergence of the stochastic
gradient descent method to non-isolated global minima. Under mild assumptions,
we estimate the probability for the iterations to stay near the minima by
adopting the notion of stochastic stability. After establishing such stability,
we present the lower bound complexity in terms of various error criteria for a
given error tolerance $\epsilon$ and a failure probability $\gamma$.
Related papers
- On the Hardness of Meaningful Local Guarantees in Nonsmooth Nonconvex Optimization [37.41427897807821]
We show the complexity of cryptographic nonknown regular optimization.
Local algorithms acting on Lipschitz functions cannot, in the worst case, provide meaningful local in terms of value in subexponma minima.
arXiv Detail & Related papers (2024-09-16T14:35:00Z) - High-probability minimax lower bounds [2.5993680263955947]
We introduce the notion of a minimax quantile, and seek to articulate its dependence on the quantile level.
We develop high-probability variants of the classical Le Cam and Fano methods, as well as a technique to convert local minimax risk lower bounds to lower bounds on minimax quantiles.
arXiv Detail & Related papers (2024-06-19T11:15:01Z) - A Unified Theory of Stochastic Proximal Point Methods without Smoothness [52.30944052987393]
Proximal point methods have attracted considerable interest owing to their numerical stability and robustness against imperfect tuning.
This paper presents a comprehensive analysis of a broad range of variations of the proximal point method (SPPM)
arXiv Detail & Related papers (2024-05-24T21:09:19Z) - Riemannian stochastic optimization methods avoid strict saddle points [68.80251170757647]
We show that policies under study avoid strict saddle points / submanifolds with probability 1.
This result provides an important sanity check as it shows that, almost always, the limit state of an algorithm can only be a local minimizer.
arXiv Detail & Related papers (2023-11-04T11:12:24Z) - A Stability Principle for Learning under Non-Stationarity [1.1510009152620668]
We develop a versatile framework for statistical learning in non-stationary environments.
At the heart of our analysis lie two novel components: a measure of similarity between functions and a segmentation technique for dividing the non-stationary data sequence into quasi-stationary pieces.
arXiv Detail & Related papers (2023-10-27T17:53:53Z) - High-Probability Bounds for Stochastic Optimization and Variational
Inequalities: the Case of Unbounded Variance [59.211456992422136]
We propose algorithms with high-probability convergence results under less restrictive assumptions.
These results justify the usage of the considered methods for solving problems that do not fit standard functional classes in optimization.
arXiv Detail & Related papers (2023-02-02T10:37:23Z) - Beyond the Edge of Stability via Two-step Gradient Updates [49.03389279816152]
Gradient Descent (GD) is a powerful workhorse of modern machine learning.
GD's ability to find local minimisers is only guaranteed for losses with Lipschitz gradients.
This work focuses on simple, yet representative, learning problems via analysis of two-step gradient updates.
arXiv Detail & Related papers (2022-06-08T21:32:50Z) - Probabilistic learning inference of boundary value problem with
uncertainties based on Kullback-Leibler divergence under implicit constraints [0.0]
We present a general methodology of a probabilistic learning inference that allows for estimating a posterior probability model for a boundary value problem from a prior probability model.
A statistical surrogate model of the implicit mapping, which represents the constraints, is introduced.
In a second part, an application is presented to illustrate the proposed theory and is also, as such, a contribution to the three-dimensional homogenization of heterogeneous linear elastic media.
arXiv Detail & Related papers (2022-02-10T16:00:10Z) - Optimal variance-reduced stochastic approximation in Banach spaces [114.8734960258221]
We study the problem of estimating the fixed point of a contractive operator defined on a separable Banach space.
We establish non-asymptotic bounds for both the operator defect and the estimation error.
arXiv Detail & Related papers (2022-01-21T02:46:57Z) - Nonconvex Stochastic Scaled-Gradient Descent and Generalized Eigenvector
Problems [98.34292831923335]
Motivated by the problem of online correlation analysis, we propose the emphStochastic Scaled-Gradient Descent (SSD) algorithm.
We bring these ideas together in an application to online correlation analysis, deriving for the first time an optimal one-time-scale algorithm with an explicit rate of local convergence to normality.
arXiv Detail & Related papers (2021-12-29T18:46:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.