Bridging the Gap Between Variational Inference and Wasserstein Gradient
Flows
- URL: http://arxiv.org/abs/2310.20090v1
- Date: Tue, 31 Oct 2023 00:10:19 GMT
- Title: Bridging the Gap Between Variational Inference and Wasserstein Gradient
Flows
- Authors: Mingxuan Yi, Song Liu
- Abstract summary: We bridge the gap between variational inference and Wasserstein gradient flows.
Under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow.
We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow.
- Score: 6.452626686361619
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Variational inference is a technique that approximates a target distribution
by optimizing within the parameter space of variational families. On the other
hand, Wasserstein gradient flows describe optimization within the space of
probability measures where they do not necessarily admit a parametric density
function. In this paper, we bridge the gap between these two methods. We
demonstrate that, under certain conditions, the Bures-Wasserstein gradient flow
can be recast as the Euclidean gradient flow where its forward Euler scheme is
the standard black-box variational inference algorithm. Specifically, the
vector field of the gradient flow is generated via the path-derivative gradient
estimator. We also offer an alternative perspective on the path-derivative
gradient, framing it as a distillation procedure to the Wasserstein gradient
flow. Distillations can be extended to encompass $f$-divergences and
non-Gaussian variational families. This extension yields a new gradient
estimator for $f$-divergences, readily implementable using contemporary machine
learning libraries like PyTorch or TensorFlow.
Related papers
- Adversarial flows: A gradient flow characterization of adversarial attacks [1.8749305679160366]
A popular method to perform adversarial attacks on neuronal networks is the so-called fast gradient sign method.
We show convergence of the discretization to the associated gradient flow.
arXiv Detail & Related papers (2024-06-08T07:05:26Z) - Differentially Private Gradient Flow based on the Sliced Wasserstein Distance [59.1056830438845]
We introduce a novel differentially private generative modeling approach based on a gradient flow in the space of probability measures.
Experiments show that our proposed model can generate higher-fidelity data at a low privacy budget.
arXiv Detail & Related papers (2023-12-13T15:47:30Z) - Particle-based Variational Inference with Generalized Wasserstein
Gradient Flow [32.37056212527921]
We propose a ParVI framework, called generalized Wasserstein gradient descent (GWG)
We show that GWG exhibits strong convergence guarantees.
We also provide an adaptive version that automatically chooses Wasserstein metric to accelerate convergence.
arXiv Detail & Related papers (2023-10-25T10:05:42Z) - Sampling via Gradient Flows in the Space of Probability Measures [10.892894776497165]
Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development.
This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows.
arXiv Detail & Related papers (2023-10-05T15:20:35Z) - Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance [10.153270126742369]
We study gradient flows in both probability density space and Gaussian space.
The flow in the Gaussian space may be understood as a Gaussian approximation of the flow.
arXiv Detail & Related papers (2023-02-21T21:44:08Z) - Variational Wasserstein gradient flow [9.901677207027806]
We propose a scalable proximal gradient type algorithm for Wasserstein gradient flow.
Our framework covers all the classical Wasserstein gradient flows including the heat equation and the porous medium equation.
arXiv Detail & Related papers (2021-12-04T20:27:31Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization [106.70006655990176]
A distributional optimization problem arises widely in machine learning and statistics.
We propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent.
We prove that when the objective function satisfies a functional version of the Polyak-Lojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly.
arXiv Detail & Related papers (2020-12-21T18:33:13Z) - Channel-Directed Gradients for Optimization of Convolutional Neural
Networks [50.34913837546743]
We introduce optimization methods for convolutional neural networks that can be used to improve existing gradient-based optimization in terms of generalization error.
We show that defining the gradients along the output channel direction leads to a performance boost, while other directions can be detrimental.
arXiv Detail & Related papers (2020-08-25T00:44:09Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.