Quasi-potential theory for escape problem: Quantitative sharpness effect
on SGD's escape from local minima
- URL: http://arxiv.org/abs/2111.04004v1
- Date: Sun, 7 Nov 2021 05:00:35 GMT
- Title: Quasi-potential theory for escape problem: Quantitative sharpness effect
on SGD's escape from local minima
- Authors: Hikaru Ibayashi and Masaaki Imaizumi
- Abstract summary: We develop a quantitative theory on a slow gradient descent (SGD) algorithm.
We investigate the effect of sharpness of loss surfaces on the noise neural networks.
- Score: 10.990447273771592
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We develop a quantitative theory on an escape problem of a stochastic
gradient descent (SGD) algorithm and investigate the effect of sharpness of
loss surfaces on the escape. Deep learning has achieved tremendous success in
various domains, however, it has opened up various theoretical open questions.
One of the typical questions is why an SGD can find parameters that generalize
well over non-convex loss surfaces. An escape problem is an approach to tackle
this question, which investigates how efficiently an SGD escapes from local
minima. In this paper, we develop a quasi-potential theory for the escape
problem, by applying a theory of stochastic dynamical systems. We show that the
quasi-potential theory can handle both geometric properties of loss surfaces
and a covariance structure of gradient noise in a unified manner, while they
have been separately studied in previous works. Our theoretical results imply
that (i) the sharpness of loss surfaces contributes to the slow escape of an
SGD, and (ii) the SGD's noise structure cancels the effect and exponentially
accelerates the escape. We also conduct experiments to empirically validate our
theory using neural networks trained with real data.
Related papers
- On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - A Theoretical Analysis of Noise Geometry in Stochastic Gradient Descent [9.064667124987068]
Minibatch gradient descent (SGD) is a geometry phenomenon where noise aligns favorably with the geometry of local landscape.
We propose two metrics, derived from analyzing how noise influences the loss and subspace projection dynamics, to quantify the alignment strength.
arXiv Detail & Related papers (2023-10-01T14:58:20Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - Generalization Bounds for Stochastic Gradient Langevin Dynamics: A
Unified View via Information Leakage Analysis [49.402932368689775]
We present a unified generalization from privacy leakage analysis to investigate the bounds of SGLD.
We also conduct various numerical minimization to assess the information leakage issue SGLD.
arXiv Detail & Related papers (2021-12-14T06:45:52Z) - On the Double Descent of Random Features Models Trained with SGD [78.0918823643911]
We study properties of random features (RF) regression in high dimensions optimized by gradient descent (SGD)
We derive precise non-asymptotic error bounds of RF regression under both constant and adaptive step-size SGD setting.
We observe the double descent phenomenon both theoretically and empirically.
arXiv Detail & Related papers (2021-10-13T17:47:39Z) - Noise and Fluctuation of Finite Learning Rate Stochastic Gradient
Descent [3.0079490585515343]
gradient descent (SGD) is relatively well understood in the vanishing learning rate regime.
We propose to study the basic properties of SGD and its variants in the non-vanishing learning rate regime.
arXiv Detail & Related papers (2020-12-07T12:31:43Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Dynamic of Stochastic Gradient Descent with State-Dependent Noise [84.64013284862733]
gradient descent (SGD) and its variants are mainstream methods to train deep neural networks.
We show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state.
We propose a novel power-law dynamic with state-dependent diffusion to approximate the dynamic of SGD.
arXiv Detail & Related papers (2020-06-24T13:34:38Z) - Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks [27.54155197562196]
We show that the trajectories of gradient descent (SGD) can be wellapproximated by a emphFeller process.
We propose a "capacity metric" to measure the success of such generalizations.
arXiv Detail & Related papers (2020-06-16T16:57:12Z) - Fractional Underdamped Langevin Dynamics: Retargeting SGD with Momentum
under Heavy-Tailed Gradient Noise [39.9241638707715]
We show that FULD has similarities with enatural and egradient methods on their role in deep learning.
arXiv Detail & Related papers (2020-02-13T18:04:27Z) - How neural networks find generalizable solutions: Self-tuned annealing
in deep learning [7.372592187197655]
We find a robust inverse relation between the weight variance and the landscape flatness for all SGD-based learning algorithms.
Our study indicates that SGD attains a self-tuned landscape-dependent annealing strategy to find generalizable solutions at the flat minima of the landscape.
arXiv Detail & Related papers (2020-01-06T17:35:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.