Related papers: Deep learning, stochastic gradient descent and diffusion maps

Deep learning, stochastic gradient descent and diffusion maps

URL: http://arxiv.org/abs/2204.01365v2
Date: Wed, 6 Apr 2022 09:55:33 GMT
Title: Deep learning, stochastic gradient descent and diffusion maps
Authors: Carmina Fjellstr\"om and Kaj Nystr\"om
Abstract summary: gradient descent (SGD) is widely used in deep learning due to its computational efficiency. It has been observed that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep networks are close to zero. Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Stochastic gradient descent (SGD) is widely used in deep learning due to its computational efficiency but a complete understanding of why SGD performs so well remains a major challenge. It has been observed empirically that most eigenvalues of the Hessian of the loss functions on the loss landscape of over-parametrized deep networks are close to zero, while only a small number of eigenvalues are large. Zero eigenvalues indicate zero diffusion along the corresponding directions. This indicates that the process of minima selection mainly happens in the relatively low-dimensional subspace corresponding to top eigenvalues of the Hessian. Although the parameter space is very high-dimensional, these findings seems to indicate that the SGD dynamics may mainly live on a low-dimensional manifold. In this paper we pursue a truly data driven approach to the problem of getting a potentially deeper understanding of the high-dimensional parameter surface, and in particular of the landscape traced out by SGD, by analyzing the data generated through SGD, or any other optimizer for that matter, in order to possibly discovery (local) low-dimensional representations of the optimization landscape. As our vehicle for the exploration we use diffusion maps introduced by R. Coifman and coauthors.

Related papers

Seurat: From Moving Points to Depth [66.65189052568209]
We propose a novel method that infers relative depth by examining the spatial relationships and temporal evolution of a set of tracked 2D trajectories. Our approach achieves temporally smooth, high-accuracy depth predictions across diverse domains.
arXiv Detail & Related papers (2025-04-20T17:37:02Z)
Risk Bounds of Accelerated SGD for Overparameterized Linear Regression [75.27846230182885]
Accelerated gradient descent (ASGD) is a workhorse in deep learning. Existing optimization theory can only explain the faster convergence of ASGD, but cannot explain its better generalization.
arXiv Detail & Related papers (2023-11-23T23:02:10Z)
Learning in latent spaces improves the predictive accuracy of deep neural operators [0.0]
L-DeepONet is an extension of standard DeepONet, which leverages latent representations of high-dimensional PDE input and output functions identified with suitable autoencoders. We show that L-DeepONet outperforms the standard approach in terms of both accuracy and computational efficiency across diverse time-dependent PDEs.
arXiv Detail & Related papers (2023-04-15T17:13:09Z)
Solving High-Dimensional PDEs with Latent Spectral Models [74.1011309005488]
We present Latent Spectral Models (LSM) toward an efficient and precise solver for high-dimensional PDEs. Inspired by classical spectral methods in numerical analysis, we design a neural spectral block to solve PDEs in the latent space. LSM achieves consistent state-of-the-art and yields a relative gain of 11.5% averaged on seven benchmarks.
arXiv Detail & Related papers (2023-01-30T04:58:40Z)
Learning High-Precision Bounding Box for Rotated Object Detection via Kullback-Leibler Divergence [100.6913091147422]
Existing rotated object detectors are mostly inherited from the horizontal detection paradigm. In this paper, we are motivated to change the design of rotation regression loss from induction paradigm to deduction methodology.
arXiv Detail & Related papers (2021-06-03T14:29:19Z)
Understanding Long Range Memory Effects in Deep Neural Networks [10.616643031188248]
textitstochastic gradient descent (SGD) is of fundamental importance in deep learning. In this study, we argue that SGN is neither Gaussian nor stable. Instead, we propose that SGD can be viewed as a discretization of an SDE driven by textitfractional Brownian motion (FBM)
arXiv Detail & Related papers (2021-05-05T13:54:26Z)
Robust Differentiable SVD [117.35644933471401]
Eigendecomposition of symmetric matrices is at the heart of many computer vision algorithms. Instability arises in the presence of eigenvalues that are close to each other. We show that the Taylor expansion of the SVD gradient is theoretically equivalent to the gradient obtained using PI without relying on an iterative process.
arXiv Detail & Related papers (2021-04-08T15:04:15Z)
Direction Matters: On the Implicit Bias of Stochastic Gradient Descent with Moderate Learning Rate [105.62979485062756]
This paper attempts to characterize the particular regularization effect of SGD in the moderate learning rate regime. We show that SGD converges along the large eigenvalue directions of the data matrix, while GD goes after the small eigenvalue directions.
arXiv Detail & Related papers (2020-11-04T21:07:52Z)
Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification [47.23063195722975]
Differentially private SGD (DP-SGD) is one of the most popular methods for solving differentially private empirical risk minimization (ERM) Due to its noisy perturbation on each gradient update, the error rate of DP-SGD scales with the ambient dimension $p$, the number of parameters in the model. We propose Projected DP-SGD that performs noise reduction by projecting the noisy gradients to a low-dimensional subspace.
arXiv Detail & Related papers (2020-07-07T22:31:01Z)
Online stochastic gradient descent on non-convex losses from high-dimensional inference [2.2344764434954256]
gradient descent (SGD) is a popular algorithm for optimization problems in high-dimensional tasks. In this paper we produce an estimator of non-trivial correlation from data. We illustrate our approach by applying it to a set of tasks such as phase retrieval, and estimation for generalized models.
arXiv Detail & Related papers (2020-03-23T17:34:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.