Deep learning, stochastic gradient descent and diffusion maps
- URL: http://arxiv.org/abs/2204.01365v2
- Date: Wed, 6 Apr 2022 09:55:33 GMT
- Title: Deep learning, stochastic gradient descent and diffusion maps
- Authors: Carmina Fjellstr\"om and Kaj Nystr\"om
- Abstract summary: gradient descent (SGD) is widely used in deep learning due to its computational efficiency.
It has been observed that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep networks are close to zero.
Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Stochastic gradient descent (SGD) is widely used in deep learning due to its
computational efficiency but a complete understanding of why SGD performs so
well remains a major challenge. It has been observed empirically that most
eigenvalues of the Hessian of the loss functions on the loss landscape of
over-parametrized deep networks are close to zero, while only a small number of
eigenvalues are large. Zero eigenvalues indicate zero diffusion along the
corresponding directions. This indicates that the process of minima selection
mainly happens in the relatively low-dimensional subspace corresponding to top
eigenvalues of the Hessian. Although the parameter space is very
high-dimensional, these findings seems to indicate that the SGD dynamics may
mainly live on a low-dimensional manifold. In this paper we pursue a truly data
driven approach to the problem of getting a potentially deeper understanding of
the high-dimensional parameter surface, and in particular of the landscape
traced out by SGD, by analyzing the data generated through SGD, or any other
optimizer for that matter, in order to possibly discovery (local)
low-dimensional representations of the optimization landscape. As our vehicle
for the exploration we use diffusion maps introduced by R. Coifman and
coauthors.
Related papers
- Risk Bounds of Accelerated SGD for Overparameterized Linear Regression [75.27846230182885]
Accelerated gradient descent (ASGD) is a workhorse in deep learning.
Existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization.
arXiv Detail & Related papers (2023-11-23T23:02:10Z) - Learning in latent spaces improves the predictive accuracy of deep
neural operators [0.0]
L-DeepONet is an extension of standard DeepONet, which leverages latent representations of high-dimensional PDE input and output functions identified with suitable autoencoders.
We show that L-DeepONet outperforms the standard approach in terms of both accuracy and computational efficiency across diverse time-dependent PDEs.
arXiv Detail & Related papers (2023-04-15T17:13:09Z) - Solving High-Dimensional PDEs with Latent Spectral Models [74.1011309005488]
We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs.
Inspired by classical spectral methods in numerical analysis, we design a neural spectral block to solve PDEs in the latent space.
LSM achieves consistent state-of-the-art and yields a relative gain of 11.5% averaged on seven benchmarks.
arXiv Detail & Related papers (2023-01-30T04:58:40Z) - Learning High-Precision Bounding Box for Rotated Object Detection via
Kullback-Leibler Divergence [100.6913091147422]
Existing rotated object detectors are mostly inherited from the horizontal detection paradigm.
In this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology.
arXiv Detail & Related papers (2021-06-03T14:29:19Z) - Understanding Long Range Memory Effects in Deep Neural Networks [10.616643031188248]
textitstochastic gradient descent (SGD) is of fundamental importance in deep learning.
In this study, we argue that SGN is neither Gaussian nor stable. Instead, we propose that SGD can be viewed as a discretization of an SDE driven by textitfractional Brownian motion (FBM)
arXiv Detail & Related papers (2021-05-05T13:54:26Z) - Robust Differentiable SVD [117.35644933471401]
Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms.
Instability arises in the presence of eigenvalues that are close to each other.
We show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying on an iterative process.
arXiv Detail & Related papers (2021-04-08T15:04:15Z) - Direction Matters: On the Implicit Bias of Stochastic Gradient Descent
with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime.
We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z) - Bypassing the Ambient Dimension: Private SGD with Gradient Subspace
Identification [47.23063195722975]
Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM)
Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model.
We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
arXiv Detail & Related papers (2020-07-07T22:31:01Z) - Online stochastic gradient descent on non-convex losses from
high-dimensional inference [2.2344764434954256]
gradient descent (SGD) is a popular algorithm for optimization problems in high-dimensional tasks.
In this paper we produce an estimator of non-trivial correlation from data.
We illustrate our approach by applying it to a set of tasks such as phase retrieval, and estimation for generalized models.
arXiv Detail & Related papers (2020-03-23T17:34:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.