Sampling via Gradient Flows in the Space of Probability Measures
- URL: http://arxiv.org/abs/2310.03597v3
- Date: Sat, 9 Mar 2024 15:35:46 GMT
- Title: Sampling via Gradient Flows in the Space of Probability Measures
- Authors: Yifan Chen, Daniel Zhengyu Huang, Jiaoyang Huang, Sebastian Reich,
Andrew M Stuart
- Abstract summary: Recent work shows that algorithms derived by considering gradient flows in the space of probability measures open up new avenues for algorithm development.
This paper makes three contributions to this sampling approach by scrutinizing the design components of such gradient flows.
- Score: 10.892894776497165
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Sampling a target probability distribution with an unknown normalization
constant is a fundamental challenge in computational science and engineering.
Recent work shows that algorithms derived by considering gradient flows in the
space of probability measures open up new avenues for algorithm development.
This paper makes three contributions to this sampling approach by scrutinizing
the design components of such gradient flows. Any instantiation of a gradient
flow for sampling needs an energy functional and a metric to determine the
flow, as well as numerical approximations of the flow to derive algorithms. Our
first contribution is to show that the Kullback-Leibler divergence, as an
energy functional, has the unique property (among all f-divergences) that
gradient flows resulting from it do not depend on the normalization constant of
the target distribution. Our second contribution is to study the choice of
metric from the perspective of invariance. The Fisher-Rao metric is known as
the unique choice (up to scaling) that is diffeomorphism invariant. As a
computationally tractable alternative, we introduce a relaxed, affine
invariance property for the metrics and gradient flows. In particular, we
construct various affine invariant Wasserstein and Stein gradient flows. Affine
invariant gradient flows are shown to behave more favorably than their
non-affine-invariant counterparts when sampling highly anisotropic
distributions, in theory and by using particle methods. Our third contribution
is to study, and develop efficient algorithms based on Gaussian approximations
of the gradient flows; this leads to an alternative to particle methods. We
establish connections between various Gaussian approximate gradient flows,
discuss their relation to gradient methods arising from parametric variational
inference, and study their convergence properties both theoretically and
numerically.
Related papers
- Generalizing Stochastic Smoothing for Differentiation and Gradient Estimation [59.86921150579892]
We deal with the problem of gradient estimation for differentiable relaxations of algorithms, operators, simulators, and other non-differentiable functions.
We develop variance reduction strategies for differentiable sorting and ranking, differentiable shortest-paths on graphs, differentiable rendering for pose estimation, as well as differentiable cryo-ET simulations.
arXiv Detail & Related papers (2024-10-10T17:10:00Z) - Adversarial flows: A gradient flow characterization of adversarial attacks [1.8749305679160366]
A popular method to perform adversarial attacks on neuronal networks is the so-called fast gradient sign method.
We show convergence of the discretization to the associated gradient flow.
arXiv Detail & Related papers (2024-06-08T07:05:26Z) - Bridging the Gap Between Variational Inference and Wasserstein Gradient
Flows [6.452626686361619]
We bridge the gap between variational inference and Wasserstein gradient flows.
Under certain conditions, the Bures-Wasserstein gradient flow can be recast as the Euclidean gradient flow.
We also offer an alternative perspective on the path-derivative gradient, framing it as a distillation procedure to the Wasserstein gradient flow.
arXiv Detail & Related papers (2023-10-31T00:10:19Z) - Variational Gaussian filtering via Wasserstein gradient flows [6.023171219551961]
We present a novel approach to approximate Gaussian and mixture-of-Gaussians filtering.
Our method relies on a variational approximation via a gradient-flow representation.
arXiv Detail & Related papers (2023-03-11T12:22:35Z) - Gradient Flows for Sampling: Mean-Field Models, Gaussian Approximations and Affine Invariance [10.153270126742369]
We study gradient flows in both probability density space and Gaussian space.
The flow in the Gaussian space may be understood as a Gaussian approximation of the flow.
arXiv Detail & Related papers (2023-02-21T21:44:08Z) - Efficient CDF Approximations for Normalizing Flows [64.60846767084877]
We build upon the diffeomorphic properties of normalizing flows to estimate the cumulative distribution function (CDF) over a closed region.
Our experiments on popular flow architectures and UCI datasets show a marked improvement in sample efficiency as compared to traditional estimators.
arXiv Detail & Related papers (2022-02-23T06:11:49Z) - Variational Wasserstein gradient flow [9.901677207027806]
We propose a scalable proximal gradient type algorithm for Wasserstein gradient flow.
Our framework covers all the classical Wasserstein gradient flows including the heat equation and the porous medium equation.
arXiv Detail & Related papers (2021-12-04T20:27:31Z) - Differentiable Annealed Importance Sampling and the Perils of Gradient
Noise [68.44523807580438]
Annealed importance sampling (AIS) and related algorithms are highly effective tools for marginal likelihood estimation.
Differentiability is a desirable property as it would admit the possibility of optimizing marginal likelihood as an objective.
We propose a differentiable algorithm by abandoning Metropolis-Hastings steps, which further unlocks mini-batch computation.
arXiv Detail & Related papers (2021-07-21T17:10:14Z) - Large-Scale Wasserstein Gradient Flows [84.73670288608025]
We introduce a scalable scheme to approximate Wasserstein gradient flows.
Our approach relies on input neural networks (ICNNs) to discretize the JKO steps.
As a result, we can sample from the measure at each step of the gradient diffusion and compute its density.
arXiv Detail & Related papers (2021-06-01T19:21:48Z) - Variational Transport: A Convergent Particle-BasedAlgorithm for Distributional Optimization [106.70006655990176]
A distributional optimization problem arises widely in machine learning and statistics.
We propose a novel particle-based algorithm, dubbed as variational transport, which approximately performs Wasserstein gradient descent.
We prove that when the objective function satisfies a functional version of the Polyak-Lojasiewicz (PL) (Polyak, 1963) and smoothness conditions, variational transport converges linearly.
arXiv Detail & Related papers (2020-12-21T18:33:13Z) - A Near-Optimal Gradient Flow for Learning Neural Energy-Based Models [93.24030378630175]
We propose a novel numerical scheme to optimize the gradient flows for learning energy-based models (EBMs)
We derive a second-order Wasserstein gradient flow of the global relative entropy from Fokker-Planck equation.
Compared with existing schemes, Wasserstein gradient flow is a smoother and near-optimal numerical scheme to approximate real data densities.
arXiv Detail & Related papers (2019-10-31T02:26:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.