A Note on Optimizing Distributions using Kernel Mean Embeddings
- URL: http://arxiv.org/abs/2106.09994v1
- Date: Fri, 18 Jun 2021 08:33:45 GMT
- Title: A Note on Optimizing Distributions using Kernel Mean Embeddings
- Authors: Boris Muzellec, Francis Bach, Alessandro Rudi
- Abstract summary: Kernel mean embeddings represent probability measures by their infinite-dimensional mean embeddings in a reproducing kernel Hilbert space.
We show that when the kernel is characteristic, distributions with a kernel sum-of-squares density are dense.
We provide algorithms to optimize such distributions in the finite-sample setting.
- Score: 94.96262888797257
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Kernel mean embeddings are a popular tool that consists in representing
probability measures by their infinite-dimensional mean embeddings in a
reproducing kernel Hilbert space. When the kernel is characteristic, mean
embeddings can be used to define a distance between probability measures, known
as the maximum mean discrepancy (MMD). A well-known advantage of mean
embeddings and MMD is their low computational cost and low sample complexity.
However, kernel mean embeddings have had limited applications to problems that
consist in optimizing distributions, due to the difficulty of characterizing
which Hilbert space vectors correspond to a probability distribution. In this
note, we propose to leverage the kernel sums-of-squares parameterization of
positive functions of Marteau-Ferey et al. [2020] to fit distributions in the
MMD geometry. First, we show that when the kernel is characteristic,
distributions with a kernel sum-of-squares density are dense. Then, we provide
algorithms to optimize such distributions in the finite-sample setting, which
we illustrate in a density fitting numerical experiment.
Related papers
- Learning to Embed Distributions via Maximum Kernel Entropy [0.0]
Emprimiical data can often be considered as samples from a set of probability distributions.
Kernel methods have emerged as a natural approach for learning to classify these distributions.
We propose a novel objective for the unsupervised learning of data-dependent distribution kernel.
arXiv Detail & Related papers (2024-08-01T13:34:19Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Variance-Aware Estimation of Kernel Mean Embedding [8.277998582564784]
We show how to speed-up convergence by leveraging variance information in the reproducing kernel Hilbert space.
We show that even when such information is a priori unknown, we can efficiently estimate it from the data.
arXiv Detail & Related papers (2022-10-13T01:58:06Z) - Gaussian Processes on Distributions based on Regularized Optimal
Transport [2.905751301655124]
We present a novel kernel over the space of probability measures based on the dual formulation of optimal regularized transport.
We prove that this construction enables to obtain a valid kernel, by using the Hilbert norms.
We provide theoretical guarantees on the behaviour of a Gaussian process based on this kernel.
arXiv Detail & Related papers (2022-10-12T20:30:23Z) - Targeted Separation and Convergence with Kernel Discrepancies [61.973643031360254]
kernel-based discrepancy measures are required to (i) separate a target P from other probability measures or (ii) control weak convergence to P.
In this article we derive new sufficient and necessary conditions to ensure (i) and (ii)
For MMDs on separable metric spaces, we characterize those kernels that separate Bochner embeddable measures and introduce simple conditions for separating all measures with unbounded kernels.
arXiv Detail & Related papers (2022-09-26T16:41:16Z) - Optimal Scaling for Locally Balanced Proposals in Discrete Spaces [65.14092237705476]
We show that efficiency of Metropolis-Hastings (M-H) algorithms in discrete spaces can be characterized by an acceptance rate that is independent of the target distribution.
Knowledge of the optimal acceptance rate allows one to automatically tune the neighborhood size of a proposal distribution in a discrete space, directly analogous to step-size control in continuous spaces.
arXiv Detail & Related papers (2022-09-16T22:09:53Z) - Density-Based Clustering with Kernel Diffusion [59.4179549482505]
A naive density corresponding to the indicator function of a unit $d$-dimensional Euclidean ball is commonly used in density-based clustering algorithms.
We propose a new kernel diffusion density function, which is adaptive to data of varying local distributional characteristics and smoothness.
arXiv Detail & Related papers (2021-10-11T09:00:33Z) - Optimal policy evaluation using kernel-based temporal difference methods [78.83926562536791]
We use kernel Hilbert spaces for estimating the value function of an infinite-horizon discounted Markov reward process.
We derive a non-asymptotic upper bound on the error with explicit dependence on the eigenvalues of the associated kernel operator.
We prove minimax lower bounds over sub-classes of MRPs.
arXiv Detail & Related papers (2021-09-24T14:48:20Z) - Riemannian Gaussian distributions, random matrix ensembles and diffusion
kernels [0.0]
We show how to compute marginals of the probability density functions on a random matrix type of symmetric spaces.
We also show how the probability density functions are a particular case of diffusion kernels of the Karlin-McGregor type, describing non-intersecting processes in the Weyl chamber of Lie groups.
arXiv Detail & Related papers (2020-11-27T11:41:29Z) - Schoenberg-Rao distances: Entropy-based and geometry-aware statistical
Hilbert distances [12.729120803225065]
We study a class of statistical Hilbert distances that we term the Schoenberg-Rao distances.
We derive novel closed-form distances between mixtures of Gaussian distributions.
Our method constitutes a practical alternative to Wasserstein distances and we illustrate its efficiency on a broad range of machine learning tasks.
arXiv Detail & Related papers (2020-02-19T18:48:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.