Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence
- URL: http://arxiv.org/abs/2408.16543v1
- Date: Thu, 29 Aug 2024 14:01:30 GMT
- Title: Statistical and Geometrical properties of regularized Kernel Kullback-Leibler divergence
- Authors: Clémentine Chazal, Anna Korba, Francis Bach,
- Abstract summary: We study the statistical and geometrical properties of the Kullback-Leibler divergence with kernel covariance operators introduced by Bach [2022]
Unlike the classical Kullback-Leibler (KL) divergence that involves density ratios, the KKL compares probability distributions through covariance operators (embeddings) in a reproducible kernel Hilbert space (RKHS)
This novel divergence hence shares parallel but different aspects with both the standard Kullback-Leibler between probability distributions and kernel embeddings metrics such as the maximum mean discrepancy.
- Score: 7.273481485032721
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we study the statistical and geometrical properties of the Kullback-Leibler divergence with kernel covariance operators (KKL) introduced by Bach [2022]. Unlike the classical Kullback-Leibler (KL) divergence that involves density ratios, the KKL compares probability distributions through covariance operators (embeddings) in a reproducible kernel Hilbert space (RKHS), and compute the Kullback-Leibler quantum divergence. This novel divergence hence shares parallel but different aspects with both the standard Kullback-Leibler between probability distributions and kernel embeddings metrics such as the maximum mean discrepancy. A limitation faced with the original KKL divergence is its inability to be defined for distributions with disjoint supports. To solve this problem, we propose in this paper a regularised variant that guarantees that the divergence is well defined for all distributions. We derive bounds that quantify the deviation of the regularised KKL to the original one, as well as finite-sample bounds. In addition, we provide a closed-form expression for the regularised KKL, specifically applicable when the distributions consist of finite sets of points, which makes it implementable. Furthermore, we derive a Wasserstein gradient descent scheme of the KKL divergence in the case of discrete distributions, and study empirically its properties to transport a set of points to a target distribution.
Related papers
- Convergence of Continuous Normalizing Flows for Learning Probability Distributions [10.381321024264484]
Continuous normalizing flows (CNFs) are a generative method for learning probability distributions.
We study the theoretical properties of CNFs with linear regularity in learning probability distributions from a finite random sample.
We present a convergence analysis framework that encompasses the error due to velocity estimation, the discretization error, and the early stopping error.
arXiv Detail & Related papers (2024-03-31T03:39:04Z) - Adaptive Annealed Importance Sampling with Constant Rate Progress [68.8204255655161]
Annealed Importance Sampling (AIS) synthesizes weighted samples from an intractable distribution.
We propose the Constant Rate AIS algorithm and its efficient implementation for $alpha$-divergences.
arXiv Detail & Related papers (2023-06-27T08:15:28Z) - Targeted Separation and Convergence with Kernel Discrepancies [61.973643031360254]
kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or (ii) control weak convergence to P.
In this article we derive new sufficient and necessary conditions to ensure (i) and (ii)
For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels.
arXiv Detail & Related papers (2022-09-26T16:41:16Z) - Wrapped Distributions on homogeneous Riemannian manifolds [58.720142291102135]
Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions.
We empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model.
arXiv Detail & Related papers (2022-04-20T21:25:21Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - A Note on Optimizing Distributions using Kernel Mean Embeddings [94.96262888797257]
Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
arXiv Detail & Related papers (2021-06-18T08:33:45Z) - KALE Flow: A Relaxed KL Gradient Flow for Probabilities with Disjoint
Support [27.165565512841656]
We study the gradient flow for a relaxed approximation to the Kullback-Leibler divergence between a moving source and a fixed target distribution.
This approximation, termed the KALE (KL approximate lower-bound estimator), solves a regularized version of the Fenchel dual problem defining the KL over a restricted class of functions.
arXiv Detail & Related papers (2021-06-16T16:37:43Z) - $\alpha$-Geodesical Skew Divergence [5.3556221126231085]
The asymmetric skew divergence smooths one of the distributions by mixing it, to a degree determined by the parameter $lambda$, with the other distribution.
Such divergence is an approximation of the KL divergence that does not require the target distribution to be absolutely continuous with respect to the source distribution.
arXiv Detail & Related papers (2021-03-31T13:27:58Z) - Independent Gaussian Distributions Minimize the Kullback-Leibler (KL)
Divergence from Independent Gaussian Distributions [23.249999313567624]
This note is on a property of the Kullback-Leibler (KL) divergence.
The primary purpose of this note is for the referencing of papers that need to make use of this property entirely or partially.
arXiv Detail & Related papers (2020-11-04T22:05:45Z) - Kullback-Leibler divergence between quantum distributions, and its
upper-bound [1.2183405753834562]
This work presents an upper-bound to value that the Kullback-Leibler (KL) divergence can reach for a class of probability distributions called quantum distributions (QD)
The retrieving of an upper-bound for the entropic divergence is here shown to be possible under the condition that the compared distributions are quantum distributions over the same quantum value, thus they become comparable.
arXiv Detail & Related papers (2020-08-13T14:42:13Z) - Distribution-free binary classification: prediction sets, confidence
intervals and calibration [106.50279469344937]
We study three notions of uncertainty quantification -- calibration, confidence intervals and prediction sets -- for binary classification in the distribution-free setting.
We derive confidence intervals for binned probabilities for both fixed-width and uniform-mass binning.
As a consequence of our 'tripod' theorems, these confidence intervals for binned probabilities lead to distribution-free calibration.
arXiv Detail & Related papers (2020-06-18T14:17:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.