Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
- URL: http://arxiv.org/abs/2402.03467v1
- Date: Fri, 2 Feb 2024 14:29:38 GMT
- Title: Stochastic Modified Flows for Riemannian Stochastic Gradient Descent
- Authors: Benjamin Gess, Sebastian Kassing, Nimit Rana
- Abstract summary: We show that RSGD can be approximated by the solution to the RSMF driven by an infinite-dimensional Wiener process.
The RSGD is build using the concept of a retraction map, that is, a cost efficient approximation of the exponential map.
We prove quantitative bounds for the weak error of the diffusion approximation under assumptions on the retraction map, the geometry of the manifold, and the random estimators of the gradient.
- Score: 0.6445605125467574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We give quantitative estimates for the rate of convergence of Riemannian
stochastic gradient descent (RSGD) to Riemannian gradient flow and to a
diffusion process, the so-called Riemannian stochastic modified flow (RSMF).
Using tools from stochastic differential geometry we show that, in the small
learning rate regime, RSGD can be approximated by the solution to the RSMF
driven by an infinite-dimensional Wiener process. The RSMF accounts for the
random fluctuations of RSGD and, thereby, increases the order of approximation
compared to the deterministic Riemannian gradient flow. The RSGD is build using
the concept of a retraction map, that is, a cost efficient approximation of the
exponential map, and we prove quantitative bounds for the weak error of the
diffusion approximation under assumptions on the retraction map, the geometry
of the manifold, and the random estimators of the gradient.
Related papers
- On the Computation of the Gaussian Rate-Distortion-Perception Function [10.564071872770146]
We study the computation of the rate-distortion-perception function (RDPF) for a multivariate Gaussian source under mean squared error (MSE) distortion.
We provide the associated algorithmic realization, as well as the convergence and the rate of convergence characterization.
We corroborate our results with numerical simulations and draw connections to existing results.
arXiv Detail & Related papers (2023-11-15T18:34:03Z) - Moreau Envelope ADMM for Decentralized Weakly Convex Optimization [55.2289666758254]
This paper proposes a proximal variant of the alternating direction method of multipliers (ADMM) for distributed optimization.
The results of our numerical experiments indicate that our method is faster and more robust than widely-used approaches.
arXiv Detail & Related papers (2023-08-31T14:16:30Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - Riemannian Laplace approximations for Bayesian neural networks [3.6990978741464904]
We propose a simple parametric approximate posterior that adapts to the shape of the true posterior.
We show that our approach consistently improves over the conventional Laplace approximation across tasks.
arXiv Detail & Related papers (2023-06-12T14:44:22Z) - Stochastic Modified Flows, Mean-Field Limits and Dynamics of Stochastic
Gradient Descent [1.2031796234206138]
We propose new limiting dynamics for gradient descent in the small learning rate regime called modified flows.
These SDEs are driven by a cylindrical Brownian motion and improve the so-called modified equations by having regular diffusion coefficients and by matching the multi-point statistics.
arXiv Detail & Related papers (2023-02-14T15:33:59Z) - Mean-field Variational Inference via Wasserstein Gradient Flow [8.05603983337769]
Variational inference, such as the mean-field (MF) approximation, requires certain conjugacy structures for efficient computation.
We introduce a general computational framework to implement MFal inference for Bayesian models, with or without latent variables, using the Wasserstein gradient flow (WGF)
We propose a new constraint-free function approximation method using neural networks to numerically realize our algorithm.
arXiv Detail & Related papers (2022-07-17T04:05:32Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Moment evolution equations and moment matching for stochastic image
EPDiff [68.97335984455059]
Models of image deformation allow study of time-continuous effects transforming images by deforming the image domain.
Applications include medical image analysis with both population trends and random subject specific variation.
We use moment approximations of the corresponding Ito diffusion to construct estimators for statistical inference in the parameters full model.
arXiv Detail & Related papers (2021-10-07T11:08:11Z) - A diffusion-map-based algorithm for gradient computation on manifolds
and applications [0.0]
We recover the gradient of a given function defined on interior points of a Riemannian submanifold in the Euclidean space.
This approach is based on the estimates of the Laplace-Beltrami operator proposed in the diffusion-maps theory.
arXiv Detail & Related papers (2021-08-16T09:35:22Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.