Particle-based Variational Inference with Generalized Wasserstein
Gradient Flow
- URL: http://arxiv.org/abs/2310.16516v1
- Date: Wed, 25 Oct 2023 10:05:42 GMT
- Title: Particle-based Variational Inference with Generalized Wasserstein
Gradient Flow
- Authors: Ziheng Cheng, Shiyue Zhang, Longlin Yu, Cheng Zhang
- Abstract summary: We propose a ParVI framework, called generalized Wasserstein gradient descent (GWG)
We show that GWG exhibits strong convergence guarantees.
We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence.
- Score: 32.37056212527921
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Particle-based variational inference methods (ParVIs) such as Stein
variational gradient descent (SVGD) update the particles based on the
kernelized Wasserstein gradient flow for the Kullback-Leibler (KL) divergence.
However, the design of kernels is often non-trivial and can be restrictive for
the flexibility of the method. Recent works show that functional gradient flow
approximations with quadratic form regularization terms can improve
performance. In this paper, we propose a ParVI framework, called generalized
Wasserstein gradient descent (GWG), based on a generalized Wasserstein gradient
flow of the KL divergence, which can be viewed as a functional gradient method
with a broader class of regularizers induced by convex functions. We show that
GWG exhibits strong convergence guarantees. We also provide an adaptive version
that automatically chooses Wasserstein metric to accelerate convergence. In
experiments, we demonstrate the effectiveness and efficiency of the proposed
framework on both simulated and real data problems.
Related papers
- Functional Gradient Flows for Constrained Sampling [29.631753643887237]
We propose a new functional gradient ParVI method for constrained sampling, called constrained functional gradient flow (CFG)
We also present novel numerical strategies to handle the boundary integral term arising from the domain constraints.
arXiv Detail & Related papers (2024-10-30T16:20:48Z) - Semi-Implicit Functional Gradient Flow [30.32233517392456]
We propose a functional gradient ParVI method that uses perturbed particles as the approximation family.
The corresponding functional gradient flow, which can be estimated via denoising score matching, exhibits strong theoretical convergence guarantee.
arXiv Detail & Related papers (2024-10-23T15:00:30Z) - Adaptive Federated Learning Over the Air [108.62635460744109]
We propose a federated version of adaptive gradient methods, particularly AdaGrad and Adam, within the framework of over-the-air model training.
Our analysis shows that the AdaGrad-based training algorithm converges to a stationary point at the rate of $mathcalO( ln(T) / T 1 - frac1alpha ).
arXiv Detail & Related papers (2024-03-11T09:10:37Z) - Bridging the Gap Between Variational Inference and Wasserstein Gradient
Flows [6.452626686361619]
We bridge the gap between variational inference and Wasserstein gradient flows.
Under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow.
We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow.
arXiv Detail & Related papers (2023-10-31T00:10:19Z) - Particle-based Variational Inference with Preconditioned Functional
Gradient Flow [13.519223374081648]
We propose a new particle-based variational inference algorithm called preconditioned functional gradient flow (PFG)
PFG has several advantages over Stein variational gradient descent (SVGD)
Non-linear function classes such as neural networks can be incorporated to estimate the gradient flow.
arXiv Detail & Related papers (2022-11-25T08:31:57Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Variational Wasserstein gradient flow [9.901677207027806]
We propose a scalable proximal gradient type algorithm for Wasserstein gradient flow.
Our framework covers all the classical Wasserstein gradient flows including the heat equation and the porous medium equation.
arXiv Detail & Related papers (2021-12-04T20:27:31Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization [106.70006655990176]
A distributional optimization problem arises widely in machine learning and statistics.
We propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent.
We prove that when the objective function satisfies a functional version of the Polyak-Lojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly.
arXiv Detail & Related papers (2020-12-21T18:33:13Z) - SLEIPNIR: Deterministic and Provably Accurate Feature Expansion for
Gaussian Process Regression with Derivatives [86.01677297601624]
We propose a novel approach for scaling GP regression with derivatives based on quadrature Fourier features.
We prove deterministic, non-asymptotic and exponentially fast decaying error bounds which apply for both the approximated kernel as well as the approximated posterior.
arXiv Detail & Related papers (2020-03-05T14:33:20Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.