Related papers: What is the long-run distribution of stochastic gradient descent? A large deviations analysis

What is the long-run distribution of stochastic gradient descent? A large deviations analysis

URL: http://arxiv.org/abs/2406.09241v1
Date: Thu, 13 Jun 2024 15:44:23 GMT
Title: What is the long-run distribution of stochastic gradient descent? A large deviations analysis
Authors: Waïss Azizian, Franck Iutzeler, Jérôme Malick, Panayotis Mertikopoulos,
Abstract summary: We show that, in the long run, the problem's critical region is visited exponentially more often than any non-critical region. All other connected components of critical points are visited with frequency that is exponentially proportional to their energy level.
Score: 29.642830843568525
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. In particular, we show that, in the long run, (a) the problem's critical region is visited exponentially more often than any non-critical region; (b) the iterates of SGD are exponentially concentrated around the problem's minimum energy state (which does not always coincide with the global minimum of the objective); (c) all other connected components of critical points are visited with frequency that is exponentially proportional to their energy level; and, finally (d) any component of local maximizers or saddle points is "dominated" by a component of local minimizers which is visited exponentially more often.

Related papers

Flow-Based Non-stationary Temporal Regime Causal Structure Learning [49.77103348208835]
We introduce FANTOM, a unified framework for causal discovery.<n>It handles non stationary processes along with non Gaussian and heteroscedastic noises.<n>It simultaneously infers the number of regimes and their corresponding indices and learns each regime's Directed Acyclic Graph.
arXiv Detail & Related papers (2025-06-20T15:12:43Z)
Convergence, Sticking and Escape: Stochastic Dynamics Near Critical Points in SGD [0.0]
We study the convergence properties and escape dynamics of Gradient Descent in one-dimensional landscapes.<n>Our main focus is to identify the time scales on which SGD reliably moves from an initial point to the local minimum in the same ''basin''<n>Overall, our findings present a nuanced view of SGD's transitions between local maxima and minima, influenced by both noise characteristics and the underlying function geometry.
arXiv Detail & Related papers (2025-05-24T06:00:45Z)
Statistics of systemwide correlations in the random-field XXZ chain: Importance of rare events in the many-body localized phase [0.0]
Long-distance spin-spin correlations are investigated across the phase diagram of the random-field XXZ model. We show that longitudinal correlations exhibit markedly different behavior, revealing distinct physical regimes. Our findings shed light on the systemwide instabilities and raise important questions about the impact of such rare but large long-range correlations on the stability of the MBL phase.
arXiv Detail & Related papers (2024-10-14T09:37:44Z)
Thermalization Dynamics in Closed Quantum Many Body Systems: a Precision Large Scale Exact Diagonalization Study [0.0]
We study the finite-size deviation between the resulting equilibrium state and the thermal state. We find that the deviations are well described by the eigenstate thermalization hypothesis. We also find that local observables relax towards equilibrium exponentially with a relaxation time scale that grows linearly with system length.
arXiv Detail & Related papers (2024-09-27T15:58:05Z)
Highly complex novel critical behavior from the intrinsic randomness of quantum mechanical measurements on critical ground states -- a controlled renormalization group analysis [0.0]
We consider the effects of weak measurements on the quantum critical ground state of the one-dimensional tricritical and critical quantum Ising model. By employing a controlled renormalization group analysis we find that each problem exhibits highly complex novel scaling behavior.
arXiv Detail & Related papers (2024-09-03T17:59:04Z)
Universality in the tripartite information after global quenches: spin flip and semilocal charges [0.0]
We study stationary states emerging after global quenches in which the time evolution is under local Hamiltonians. We show that a localized perturbation in the initial state can turn an exponential decay of spatial correlations in the stationary state into an algebraic decay.
arXiv Detail & Related papers (2023-07-04T17:44:56Z)
Convergence of mean-field Langevin dynamics: Time and space discretization, stochastic gradient, and variance reduction [49.66486092259376]
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin dynamics that incorporates a distribution-dependent drift. Recent works have shown that MFLD globally minimizes an entropy-regularized convex functional in the space of measures. We provide a framework to prove a uniform-in-time propagation of chaos for MFLD that takes into account the errors due to finite-particle approximation, time-discretization, and gradient approximation.
arXiv Detail & Related papers (2023-06-12T16:28:11Z)
Localization in the random XXZ quantum spin chain [55.2480439325792]
We study the many-body localization (MBL) properties of the Heisenberg XXZ spin-$frac12$ chain in a random magnetic field. We prove that the system exhibits localization in any given energy interval at the bottom of the spectrum in a nontrivial region of the parameter space.
arXiv Detail & Related papers (2022-10-26T17:25:13Z)
From Gradient Flow on Population Loss to Learning with Stochastic Gradient Descent [50.4531316289086]
Gradient Descent (SGD) has been the method of choice for learning large-scale non-root models. An overarching paper is providing general conditions SGD converges, assuming that GF on the population loss converges. We provide a unified analysis for GD/SGD not only for classical settings like convex losses, but also for more complex problems including Retrieval Matrix sq-root.
arXiv Detail & Related papers (2022-10-13T03:55:04Z)
Role of boundary conditions in the full counting statistics of topological defects after crossing a continuous phase transition [62.997667081978825]
We analyze the role of boundary conditions in the statistics of topological defects. We show that for fast and moderate quenches, the cumulants of the kink number distribution present a universal scaling with the quench rate.
arXiv Detail & Related papers (2022-07-08T09:55:05Z)
Emergence of Fermi's Golden Rule [55.73970798291771]
Fermi's Golden Rule (FGR) applies in the limit where an initial quantum state is weakly coupled to a continuum of other final states overlapping its energy. Here we investigate what happens away from this limit, where the set of final states is discrete, with a nonzero mean level spacing.
arXiv Detail & Related papers (2022-06-01T18:35:21Z)
Understanding Long Range Memory Effects in Deep Neural Networks [10.616643031188248]
textitstochastic gradient descent (SGD) is of fundamental importance in deep learning. In this study, we argue that SGN is neither Gaussian nor stable. Instead, we propose that SGD can be viewed as a discretization of an SDE driven by textitfractional Brownian motion (FBM)
arXiv Detail & Related papers (2021-05-05T13:54:26Z)
Dynamic of Stochastic Gradient Descent with State-Dependent Noise [84.64013284862733]
gradient descent (SGD) and its variants are mainstream methods to train deep neural networks. We show that the covariance of the noise of SGD in the local region of the local minima is a quadratic function of the state. We propose a novel power-law dynamic with state-dependent diffusion to approximate the dynamic of SGD.
arXiv Detail & Related papers (2020-06-24T13:34:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.