High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance
- URL: http://arxiv.org/abs/2304.00707v2
- Date: Wed, 3 Apr 2024 18:44:32 GMT
- Title: High-dimensional scaling limits and fluctuations of online least-squares SGD with smooth covariance
- Authors: Krishnakumar Balasubramanian, Promit Ghosal, Ye He,
- Abstract summary: We derive high-dimensional scaling limits and fluctuations for the online least-squares Gradient Descent (SGD) algorithm.
Our results have several applications, including characterization of the limiting mean-square estimation or prediction errors and their fluctuations.
- Score: 16.652085114513273
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We derive high-dimensional scaling limits and fluctuations for the online least-squares Stochastic Gradient Descent (SGD) algorithm by taking the properties of the data generating model explicitly into consideration. Our approach treats the SGD iterates as an interacting particle system, where the expected interaction is characterized by the covariance structure of the input. Assuming smoothness conditions on moments of order up to eight orders, and without explicitly assuming Gaussianity, we establish the high-dimensional scaling limits and fluctuations in the form of infinite-dimensional Ordinary Differential Equations (ODEs) or Stochastic Differential Equations (SDEs). Our results reveal a precise three-step phase transition of the iterates; it goes from being ballistic, to diffusive, and finally to purely random behavior, as the noise variance goes from low, to moderate and finally to very-high noise setting. In the low-noise setting, we further characterize the precise fluctuations of the (scaled) iterates as infinite-dimensional SDEs. We also show the existence and uniqueness of solutions to the derived limiting ODEs and SDEs. Our results have several applications, including characterization of the limiting mean-square estimation or prediction errors and their fluctuations, which can be obtained by analytically or numerically solving the limiting equations.
Related papers
- Differentially Private Gradient Flow based on the Sliced Wasserstein Distance [59.1056830438845]
We introduce a novel differentially private generative modeling approach based on a gradient flow in the space of probability measures.
Experiments show that our proposed model can generate higher-fidelity data at a low privacy budget.
arXiv Detail & Related papers (2023-12-13T15:47:30Z) - Noise in the reverse process improves the approximation capabilities of
diffusion models [27.65800389807353]
In Score based Generative Modeling (SGMs), the state-of-the-art in generative modeling, reverse processes are known to perform better than their deterministic counterparts.
This paper delves into the heart of this phenomenon, comparing neural ordinary differential equations (ODEs) and neural dimension equations (SDEs) as reverse processes.
We analyze the ability of neural SDEs to approximate trajectories of the Fokker-Planck equation, revealing the advantages of neurality.
arXiv Detail & Related papers (2023-12-13T02:39:10Z) - Gaussian Mixture Solvers for Diffusion Models [84.83349474361204]
We introduce a novel class of SDE-based solvers called GMS for diffusion models.
Our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis.
arXiv Detail & Related papers (2023-11-02T02:05:38Z) - A Geometric Perspective on Diffusion Models [57.27857591493788]
We inspect the ODE-based sampling of a popular variance-exploding SDE.
We establish a theoretical relationship between the optimal ODE-based sampling and the classic mean-shift (mode-seeking) algorithm.
arXiv Detail & Related papers (2023-05-31T15:33:16Z) - Score-based Diffusion Models in Function Space [140.792362459734]
Diffusion models have recently emerged as a powerful framework for generative modeling.
We introduce a mathematically rigorous framework called Denoising Diffusion Operators (DDOs) for training diffusion models in function space.
We show that the corresponding discretized algorithm generates accurate samples at a fixed cost independent of the data resolution.
arXiv Detail & Related papers (2023-02-14T23:50:53Z) - High-dimensional limit theorems for SGD: Effective dynamics and critical
scaling [6.950316788263433]
We prove limit theorems for the trajectories of summary statistics of gradient descent (SGD)
We show a critical scaling regime for the step-size, below which the effective ballistic dynamics matches gradient flow for the population loss.
About the fixed points of this effective dynamics, the corresponding diffusive limits can be quite complex and even degenerate.
arXiv Detail & Related papers (2022-06-08T17:42:18Z) - Parsimonious Physics-Informed Random Projection Neural Networks for
Initial-Value Problems of ODEs and index-1 DAEs [0.0]
We address a physics-informed neural network based on random projections for the numerical solution of IVPs of nonlinear ODEs in linear-implicit form and index-1 DAEs.
Based on previous works on random projections, we prove the approximation capability of the scheme for ODEs in the canonical form and index-1 DAEs in the semiexplicit form.
arXiv Detail & Related papers (2022-03-10T12:34:46Z) - Continuous-time stochastic gradient descent for optimizing over the
stationary distribution of stochastic differential equations [7.65995376636176]
We develop a new continuous-time gradient descent method for optimizing over the stationary distribution oficity differential equation (SDE) models.
We rigorously prove convergence of the online forward propagation algorithm for linear SDE models and present its numerical results for nonlinear examples.
arXiv Detail & Related papers (2022-02-14T11:45:22Z) - Mean-Square Analysis with An Application to Optimal Dimension Dependence
of Langevin Monte Carlo [60.785586069299356]
This work provides a general framework for the non-asymotic analysis of sampling error in 2-Wasserstein distance.
Our theoretical analysis is further validated by numerical experiments.
arXiv Detail & Related papers (2021-09-08T18:00:05Z) - Optimal oracle inequalities for solving projected fixed-point equations [53.31620399640334]
We study methods that use a collection of random observations to compute approximate solutions by searching over a known low-dimensional subspace of the Hilbert space.
We show how our results precisely characterize the error of a class of temporal difference learning methods for the policy evaluation problem with linear function approximation.
arXiv Detail & Related papers (2020-12-09T20:19:32Z) - Probabilistic learning on manifolds constrained by nonlinear partial
differential equations for small datasets [0.0]
A novel extension of the Probabilistic Learning on Manifolds (PLoM) is presented.
It makes it possible to synthesize solutions to a wide range of nonlinear boundary value problems.
Three applications are presented.
arXiv Detail & Related papers (2020-10-27T14:34:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.