Information Theory with Kernel Methods
- URL: http://arxiv.org/abs/2202.08545v1
- Date: Thu, 17 Feb 2022 09:42:29 GMT
- Title: Information Theory with Kernel Methods
- Authors: Francis Bach (SIERRA)
- Abstract summary: We show that the von Neumann entropy and relative entropy of these operators are intimately related to the usual notions of Shannon entropy and relative entropy.
They come together with efficient estimation algorithms from various oracles on the probability distributions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the analysis of probability distributions through their
associated covariance operators from reproducing kernel Hilbert spaces. We show
that the von Neumann entropy and relative entropy of these operators are
intimately related to the usual notions of Shannon entropy and relative
entropy, and share many of their properties. They come together with efficient
estimation algorithms from various oracles on the probability distributions. We
also consider product spaces and show that for tensor product kernels, we can
define notions of mutual information and joint entropies, which can then
characterize independence perfectly, but only partially conditional
independence. We finally show how these new notions of relative entropy lead to
new upper-bounds on log partition functions, that can be used together with
convex optimization within variational inference methods, providing a new
family of probabilistic inference methods.
Related papers
- Conditioning of Banach Space Valued Gaussian Random Variables: An Approximation Approach Based on Martingales [8.81121308982678]
We investigate the conditional distributions of two Banach space valued, jointly Gaussian random variables.
We show that their means and covariances are determined by a general finite dimensional approximation scheme based upon a martingale approach.
arXiv Detail & Related papers (2024-04-04T13:57:44Z) - Distributed Markov Chain Monte Carlo Sampling based on the Alternating
Direction Method of Multipliers [143.6249073384419]
In this paper, we propose a distributed sampling scheme based on the alternating direction method of multipliers.
We provide both theoretical guarantees of our algorithm's convergence and experimental evidence of its superiority to the state-of-the-art.
In simulation, we deploy our algorithm on linear and logistic regression tasks and illustrate its fast convergence compared to existing gradient-based methods.
arXiv Detail & Related papers (2024-01-29T02:08:40Z) - Towards Entanglement Entropy of Random Large-N Theories [0.0]
We use the replica approach and the notion of shifted Matsubara frequency to compute von Neumann and R'enyi entanglement entropies.
We demonstrate the flexibility of the method by applying it to examples of a two-site problem in presence of decoherence.
arXiv Detail & Related papers (2023-03-03T18:21:54Z) - Gaussian Processes on Distributions based on Regularized Optimal
Transport [2.905751301655124]
We present a novel kernel over the space of probability measures based on the dual formulation of optimal regularized transport.
We prove that this construction enables to obtain a valid kernel, by using the Hilbert norms.
We provide theoretical guarantees on the behaviour of a Gaussian process based on this kernel.
arXiv Detail & Related papers (2022-10-12T20:30:23Z) - Sum-of-Squares Relaxations for Information Theory and Variational
Inference [0.0]
We consider extensions of the Shannon relative entropy, referred to as $f$-divergences.
We derive a sequence of convex relaxations for computing these divergences.
We provide more efficient relaxations based on spectral information divergences from quantum information theory.
arXiv Detail & Related papers (2022-06-27T13:22:40Z) - Wrapped Distributions on homogeneous Riemannian manifolds [58.720142291102135]
Control over distributions' properties, such as parameters, symmetry and modality yield a family of flexible distributions.
We empirically validate our approach by utilizing our proposed distributions within a variational autoencoder and a latent space network model.
arXiv Detail & Related papers (2022-04-20T21:25:21Z) - A Stochastic Newton Algorithm for Distributed Convex Optimization [62.20732134991661]
We analyze a Newton algorithm for homogeneous distributed convex optimization, where each machine can calculate gradients of the same population objective.
We show that our method can reduce the number, and frequency, of required communication rounds compared to existing methods without hurting performance.
arXiv Detail & Related papers (2021-10-07T17:51:10Z) - R\'enyi divergence inequalities via interpolation, with applications to
generalised entropic uncertainty relations [91.3755431537592]
We investigate quantum R'enyi entropic quantities, specifically those derived from'sandwiched' divergence.
We present R'enyi mutual information decomposition rules, a new approach to the R'enyi conditional entropy tripartite chain rules and a more general bipartite comparison.
arXiv Detail & Related papers (2021-06-19T04:06:23Z) - Entropy and relative entropy from information-theoretic principles [24.74754293747645]
We find that every relative entropy must lie between the R'enyi divergences of order $0$ and $infty$.
Our main result is a one-to-one correspondence between entropies and relative entropies.
arXiv Detail & Related papers (2020-06-19T14:50:44Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z) - Profile Entropy: A Fundamental Measure for the Learnability and
Compressibility of Discrete Distributions [63.60499266361255]
We show that for samples of discrete distributions, profile entropy is a fundamental measure unifying the concepts of estimation, inference, and compression.
Specifically, profile entropy a) determines the speed of estimating the distribution relative to the best natural estimator; b) characterizes the rate of inferring all symmetric properties compared with the best estimator over any label-invariant distribution collection; c) serves as the limit of profile compression.
arXiv Detail & Related papers (2020-02-26T17:49:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.