KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint
Support
- URL: http://arxiv.org/abs/2106.08929v1
- Date: Wed, 16 Jun 2021 16:37:43 GMT
- Title: KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint
Support
- Authors: Pierre Glaser, Michael Arbel, Arthur Gretton
- Abstract summary: We study the gradient flow for a relaxed approximation to the Kullback-Leibler divergence between a moving source and a fixed target distribution.
This approximation, termed the KALE (KL approximate lower-bound estimator), solves a regularized version of the Fenchel dual problem defining the KL over a restricted class of functions.
- Score: 27.165565512841656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the gradient flow for a relaxed approximation to the
Kullback-Leibler (KL) divergence between a moving source and a fixed target
distribution. This approximation, termed the KALE (KL approximate lower-bound
estimator), solves a regularized version of the Fenchel dual problem defining
the KL over a restricted class of functions. When using a Reproducing Kernel
Hilbert Space (RKHS) to define the function class, we show that the KALE
continuously interpolates between the KL and the Maximum Mean Discrepancy
(MMD). Like the MMD and other Integral Probability Metrics, the KALE remains
well defined for mutually singular distributions. Nonetheless, the KALE
inherits from the limiting KL a greater sensitivity to mismatch in the support
of the distributions, compared with the MMD. These two properties make the KALE
gradient flow particularly well suited when the target distribution is
supported on a low-dimensional manifold. Under an assumption of sufficient
smoothness of the trajectories, we show the global convergence of the KALE
flow. We propose a particle implementation of the flow given initial samples
from the source and the target distribution, which we use to empirically
confirm the KALE's properties.
Related papers
- Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective [29.27113653850964]
We provide a theoretical understanding of the Lipschitz continuity and second momentum properties of the diffusion process.
Our results provide deeper theoretical insights into the dynamics of the diffusion process under common data distributions.
arXiv Detail & Related papers (2024-05-26T03:32:27Z) - Soft-constrained Schrodinger Bridge: a Stochastic Control Approach [4.922305511803267]
Schr"odinger bridge can be viewed as a continuous-time control problem where the goal is to find an optimally controlled diffusion process.
We propose to generalize this problem by allowing the terminal distribution to differ from the target but penalizing the Kullback-Leibler divergence between the two distributions.
One application is the development of robust generative diffusion models.
arXiv Detail & Related papers (2024-03-04T04:10:24Z) - Non-asymptotic Convergence of Discrete-time Diffusion Models: New Approach and Improved Rate [49.97755400231656]
We establish convergence guarantees for substantially larger classes of distributions under DT diffusion processes.
We then specialize our results to a number of interesting classes of distributions with explicit parameter dependencies.
We propose a novel accelerated sampler and show that it improves the convergence rates of the corresponding regular sampler by orders of magnitude with respect to all system parameters.
arXiv Detail & Related papers (2024-02-21T16:11:47Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Convergence and concentration properties of constant step-size SGD
through Markov chains [0.0]
We consider the optimization of a smooth and strongly convex objective using constant step-size gradient descent (SGD)
We show that, for unbiased gradient estimates with mildly controlled variance, the iteration converges to an invariant distribution in total variation distance.
All our results are non-asymptotic and their consequences are discussed through a few applications.
arXiv Detail & Related papers (2023-06-20T12:36:28Z) - Concentration Bounds for Discrete Distribution Estimation in KL
Divergence [21.640337031842368]
We show that the deviation from mean scales as $sqrtk/n$ when $n ge k$ improves upon the best prior result of $k/n$.
We also establish a matching lower bound that shows that our bounds are tight up to polylogarithmic factors.
arXiv Detail & Related papers (2023-02-14T07:17:19Z) - Score-Based Diffusion meets Annealed Importance Sampling [89.92133671626327]
Annealed Importance Sampling remains one of the most effective methods for marginal likelihood estimation.
We leverage recent progress in score-based generative modeling to approximate the optimal extended target distribution for AIS proposals.
arXiv Detail & Related papers (2022-08-16T12:13:29Z) - Variational Refinement for Importance Sampling Using the Forward
Kullback-Leibler Divergence [77.06203118175335]
Variational Inference (VI) is a popular alternative to exact sampling in Bayesian inference.
Importance sampling (IS) is often used to fine-tune and de-bias the estimates of approximate Bayesian inference procedures.
We propose a novel combination of optimization and sampling techniques for approximate Bayesian inference.
arXiv Detail & Related papers (2021-06-30T11:00:24Z) - Federated Functional Gradient Boosting [75.06942944563572]
We study functional minimization in Federated Learning.
For both FFGB.C and FFGB.L, the radii of convergence shrink to zero as the feature distributions become more homogeneous.
arXiv Detail & Related papers (2021-03-11T21:49:19Z) - Distributionally Robust Bayesian Quadrature Optimization [60.383252534861136]
We study BQO under distributional uncertainty in which the underlying probability distribution is unknown except for a limited set of its i.i.d. samples.
A standard BQO approach maximizes the Monte Carlo estimate of the true expected objective given the fixed sample set.
We propose a novel posterior sampling based algorithm, namely distributionally robust BQO (DRBQO) for this purpose.
arXiv Detail & Related papers (2020-01-19T12:00:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.