Reducing the Variance of Variational Estimates of Mutual Information by
Limiting the Critic's Hypothesis Space to RKHS
- URL: http://arxiv.org/abs/2011.08651v1
- Date: Tue, 17 Nov 2020 14:32:48 GMT
- Title: Reducing the Variance of Variational Estimates of Mutual Information by
Limiting the Critic's Hypothesis Space to RKHS
- Authors: P Aditya Sreekar, Ujjwal Tiwari and Anoop Namboodiri
- Abstract summary: Mutual information (MI) is an information-theoretic measure of dependency between two random variables.
Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios.
We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mutual information (MI) is an information-theoretic measure of dependency
between two random variables. Several methods to estimate MI, from samples of
two random variables with unknown underlying probability distributions have
been proposed in the literature. Recent methods realize parametric probability
distributions or critic as a neural network to approximate unknown density
ratios. The approximated density ratios are used to estimate different
variational lower bounds of MI. While these methods provide reliable estimation
when the true MI is low, they produce high variance estimates in cases of high
MI. We argue that the high variance characteristic is due to the uncontrolled
complexity of the critic's hypothesis space. In support of this argument, we
use the data-driven Rademacher complexity of the hypothesis space associated
with the critic's architecture to analyse generalization error bound of
variational lower bound estimates of MI. In the proposed work, we show that it
is possible to negate the high variance characteristics of these estimators by
constraining the critic's hypothesis space to Reproducing Hilbert Kernel Space
(RKHS), which corresponds to a kernel learned using Automated Spectral Kernel
Learning (ASKL). By analysing the aforementioned generalization error bounds,
we augment the overall optimisation objective with effective regularisation
term. We empirically demonstrate the efficacy of this regularization in
enforcing proper bias variance tradeoff on four variational lower bounds,
namely NWJ, MINE, JS and SMILE.
Related papers
- Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate.
We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z) - Estimation of mutual information via quantum kernel method [0.0]
Estimating mutual information (MI) plays a critical role to investigate the relationship among multiple random variables with a nonlinear correlation.
We propose a method for estimating mutual information using the quantum kernel.
arXiv Detail & Related papers (2023-10-19T00:53:16Z) - Adaptive learning of density ratios in RKHS [3.047411947074805]
Estimating the ratio of two probability densities from finitely many observations is a central problem in machine learning and statistics.
We analyze a large class of density ratio estimation methods that minimize a regularized Bregman divergence between the true density ratio and a model in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2023-07-30T08:18:39Z) - Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data.
Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes.
Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z) - Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
Kernel techniques are among the most popular and flexible approaches in data science.
Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD)
In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
arXiv Detail & Related papers (2021-10-15T21:29:27Z) - Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN)
In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation.
We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z) - Statistical inference using Regularized M-estimation in the reproducing
kernel Hilbert space for handling missing data [0.76146285961466]
We first use the kernel ridge regression to develop imputation for handling item nonresponse.
A nonparametric propensity score estimator using the kernel Hilbert space is also developed.
The proposed method is applied to analyze the air pollution data measured in Beijing, China.
arXiv Detail & Related papers (2021-07-15T14:51:39Z) - Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO.
Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently.
The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z) - Bayesian Uncertainty Estimation of Learned Variational MRI
Reconstruction [63.202627467245584]
We introduce a Bayesian variational framework to quantify the model-immanent (epistemic) uncertainty.
We demonstrate that our approach yields competitive results for undersampled MRI reconstruction.
arXiv Detail & Related papers (2021-02-12T18:08:14Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains.
One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.