Tackling benign nonconvexity with smoothing and stochastic gradients
- URL: http://arxiv.org/abs/2202.09052v1
- Date: Fri, 18 Feb 2022 07:27:32 GMT
- Title: Tackling benign nonconvexity with smoothing and stochastic gradients
- Authors: Harsh Vardhan, Sebastian U. Stich
- Abstract summary: Non- optimization problems are common in machine learning, especially in Deep Learning.
In this paper we show how gradient descent can be used to solve such problems.
- Score: 28.197694894254305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Non-convex optimization problems are ubiquitous in machine learning,
especially in Deep Learning. While such complex problems can often be
successfully optimized in practice by using stochastic gradient descent (SGD),
theoretical analysis cannot adequately explain this success. In particular, the
standard analyses do not show global convergence of SGD on non-convex
functions, and instead show convergence to stationary points (which can also be
local minima or saddle points). We identify a broad class of nonconvex
functions for which we can show that perturbed SGD (gradient descent perturbed
by stochastic noise -- covering SGD as a special case) converges to a global
minimum (or a neighborhood thereof), in contrast to gradient descent without
noise that can get stuck in local minima far from a global solution. For
example, on non-convex functions that are relatively close to a convex-like
(strongly convex or PL) function we show that SGD can converge linearly to a
global optimum.
Related papers
- On the Hardness of Meaningful Local Guarantees in Nonsmooth Nonconvex Optimization [37.41427897807821]
We show the complexity of cryptographic nonknown regular optimization.
Local algorithms acting on Lipschitz functions cannot, in the worst case, provide meaningful local in terms of value in subexponma minima.
arXiv Detail & Related papers (2024-09-16T14:35:00Z) - Demystifying the Myths and Legends of Nonconvex Convergence of SGD [17.445810977264067]
gradient descent (SGD) and its variants are the main workhorses for solving large-scale optimization problems.
As our analyses, we addressed certain myths and legends related to the non convergence of the gradient.
arXiv Detail & Related papers (2023-10-19T17:58:59Z) - On Convergence of Incremental Gradient for Non-Convex Smooth Functions [63.51187646914962]
In machine learning and network optimization, algorithms like shuffle SGD are popular due to minimizing the number of misses and good cache.
This paper delves into the convergence properties SGD algorithms with arbitrary data ordering.
arXiv Detail & Related papers (2023-05-30T17:47:27Z) - Decentralized Riemannian Algorithm for Nonconvex Minimax Problems [82.50374560598493]
The minimax algorithms for neural networks have been developed to solve many problems.
In this paper, we propose two types of minimax algorithms.
For the setting, we propose DRSGDA and prove that our method achieves a gradient.
arXiv Detail & Related papers (2023-02-08T01:42:45Z) - On the Convergence to a Global Solution of Shuffling-Type Gradient
Algorithms [18.663264755108703]
gradient descent (SGD) algorithm is the method of choice in many machine learning tasks.
In this paper, we show that SGD has achieved the desired computational general complexity as convex setting.
arXiv Detail & Related papers (2022-06-13T01:25:59Z) - Benign Underfitting of Stochastic Gradient Descent [72.38051710389732]
We study to what extent may gradient descent (SGD) be understood as a "conventional" learning rule that achieves generalization performance by obtaining a good fit training data.
We analyze the closely related with-replacement SGD, for which an analogous phenomenon does not occur and prove that its population risk does in fact converge at the optimal rate.
arXiv Detail & Related papers (2022-02-27T13:25:01Z) - Global Convergence and Stability of Stochastic Gradient Descent [0.0]
We show that SGD converges to a stationary point under nearly arbitrary non-ity and noise models.
We show that SGD can be applied to a range of problems with global convergence of confidence.
arXiv Detail & Related papers (2021-10-04T19:00:50Z) - Stability and Generalization of Stochastic Gradient Methods for Minimax
Problems [71.60601421935844]
Many machine learning problems can be formulated as minimax problems such as Generative Adversarial Networks (GANs)
We provide a comprehensive generalization analysis of examples from training gradient methods for minimax problems.
arXiv Detail & Related papers (2021-05-08T22:38:00Z) - Faster Convergence of Stochastic Gradient Langevin Dynamics for
Non-Log-Concave Sampling [110.88857917726276]
We provide a new convergence analysis of gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave.
At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-reversible Markov Chain.
arXiv Detail & Related papers (2020-10-19T15:23:18Z) - An improved convergence analysis for decentralized online stochastic
non-convex optimization [17.386715847732468]
In this paper, we show that a technique called GT-Loakjasiewics (GT-Loakjasiewics) satisfies the existing condition GT-Loakjasiewics (GT-Loakjasiewics) satisfies the current best convergence rates.
The results are not only immediately applicable but also the currently known best convergence rates.
arXiv Detail & Related papers (2020-08-10T15:29:13Z) - Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses [52.039438701530905]
We provide sharp upper and lower bounds for several forms of gradient descent (SGD) on arbitrary Lipschitz nonsmooth convex losses.
Our bounds allow us to derive a new algorithm for differentially private nonsmooth convex optimization with optimal excess population risk.
arXiv Detail & Related papers (2020-06-12T02:45:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.