Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space
- URL: http://arxiv.org/abs/2401.13530v3
- Date: Fri, 24 May 2024 08:04:03 GMT
- Title: Continuous-time Riemannian SGD and SVRG Flows on Wasserstein Probabilistic Space
- Authors: Mingyang Yi, Bohan Wang,
- Abstract summary: We extend the gradient flow on Wasserstein space into the gradient descent (SGD) flow and variance reduction (SVRG) flow.
By leveraging the property of Wasserstein space, we construct differential equations to approximate the corresponding discrete dynamics in Euclidean space.
Our results are proven, which match the results in Euclidean space.
- Score: 17.13355049019388
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, optimization on the Riemannian manifold has provided new insights to the optimization community. In this regard, the manifold taken as the probability measure metric space equipped with the second-order Wasserstein distance is of particular interest, since optimization on it can be linked to practical sampling processes. In general, the standard (continuous) optimization method on Wasserstein space is Riemannian gradient flow (i.e., Langevin dynamics when minimizing KL divergence). In this paper, we aim to enrich the continuous optimization methods in the Wasserstein space, by extending the gradient flow on it into the stochastic gradient descent (SGD) flow and stochastic variance reduction gradient (SVRG) flow. The two flows in Euclidean space are standard continuous stochastic methods, while their Riemannian counterparts are unexplored. By leveraging the property of Wasserstein space, we construct stochastic differential equations (SDEs) to approximate the corresponding discrete dynamics of desired Riemannian stochastic methods in Euclidean space. Then, our probability measures flows are obtained by the Fokker-Planck equation. Finally, the convergence rates of our Riemannian stochastic flows are proven, which match the results in Euclidean space.
Related papers
- Riemannian Federated Learning via Averaging Gradient Stream [8.75592575216789]
This paper develops and analyzes an efficient Federated Averaging Gradient Stream (RFedAGS) algorithm.
Numerical simulations conducted on synthetic and real-world data demonstrate the performance of the proposed RFedAGS.
arXiv Detail & Related papers (2024-09-11T12:28:42Z) - Streamlining in the Riemannian Realm: Efficient Riemannian Optimization
with Loopless Variance Reduction [4.578425862931332]
This study focuses on the crucial reduction mechanism used in both Euclidean and Riemannian settings.
Motivated by Euclidean methods, we introduce R-based methods to replace the outer loop with computation triggered by a coin flip.
Using R- as a framework, we demonstrate its applicability to various important settings.
arXiv Detail & Related papers (2024-03-11T12:49:37Z) - Scaling Riemannian Diffusion Models [68.52820280448991]
We show that our method enables us to scale to high dimensional tasks on nontrivial manifold.
We model QCD densities on $SU(n)$ lattices and contrastively learned embeddings on high dimensional hyperspheres.
arXiv Detail & Related papers (2023-10-30T21:27:53Z) - Curvature-Independent Last-Iterate Convergence for Games on Riemannian
Manifolds [77.4346324549323]
We show that a step size agnostic to the curvature of the manifold achieves a curvature-independent and linear last-iterate convergence rate.
To the best of our knowledge, the possibility of curvature-independent rates and/or last-iterate convergence has not been considered before.
arXiv Detail & Related papers (2023-06-29T01:20:44Z) - First-Order Algorithms for Min-Max Optimization in Geodesic Metric
Spaces [93.35384756718868]
min-max algorithms have been analyzed in the Euclidean setting.
We prove that the extraiteient (RCEG) method corrected lastrate convergence at a linear rate.
arXiv Detail & Related papers (2022-06-04T18:53:44Z) - Variational Wasserstein gradient flow [9.901677207027806]
We propose a scalable proximal gradient type algorithm for Wasserstein gradient flow.
Our framework covers all the classical Wasserstein gradient flows including the heat equation and the porous medium equation.
arXiv Detail & Related papers (2021-12-04T20:27:31Z) - Sliced-Wasserstein Gradient Flows [15.048733056992855]
Minimizing functionals in the space of probability distributions can be done with Wasserstein gradient flows.
This work proposes to use gradient flows in the space of probability measures endowed with the sliced-Wasserstein distance.
arXiv Detail & Related papers (2021-10-21T08:34:26Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Stochastic Normalizing Flows [52.92110730286403]
We introduce normalizing flows for maximum likelihood estimation and variational inference (VI) using differential equations (SDEs)
Using the theory of rough paths, the underlying Brownian motion is treated as a latent variable and approximated, enabling efficient training of neural SDEs.
These SDEs can be used for constructing efficient chains to sample from the underlying distribution of a given dataset.
arXiv Detail & Related papers (2020-02-21T20:47:55Z) - The Wasserstein Proximal Gradient Algorithm [23.143814848127295]
Wasserstein gradient flows are continuous time dynamics that define curves of steepest descent to minimize an objective function over the space of probability measures.
We propose a Forward Backward (FB) discretization scheme that can tackle the case where the objective function is the sum of a smooth and a nonsmooth geodesically convex terms.
arXiv Detail & Related papers (2020-02-07T22:19:32Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.