Related papers: Reducing the Variance of Variational Estimates of Mutual Information by Limiting the Critic's Hypothesis Space to RKHS

Reducing the Variance of Variational Estimates of Mutual Information by Limiting the Critic's Hypothesis Space to RKHS

URL: http://arxiv.org/abs/2011.08651v1
Date: Tue, 17 Nov 2020 14:32:48 GMT
Title: Reducing the Variance of Variational Estimates of Mutual Information by Limiting the Critic's Hypothesis Space to RKHS
Authors: P Aditya Sreekar, Ujjwal Tiwari and Anoop Namboodiri
Abstract summary: Mutual information (MI) is an information-theoretic measure of dependency between two random variables. Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios. We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mutual information (MI) is an information-theoretic measure of dependency between two random variables. Several methods to estimate MI, from samples of two random variables with unknown underlying probability distributions have been proposed in the literature. Recent methods realize parametric probability distributions or critic as a neural network to approximate unknown density ratios. The approximated density ratios are used to estimate different variational lower bounds of MI. While these methods provide reliable estimation when the true MI is low, they produce high variance estimates in cases of high MI. We argue that the high variance characteristic is due to the uncontrolled complexity of the critic's hypothesis space. In support of this argument, we use the data-driven Rademacher complexity of the hypothesis space associated with the critic's architecture to analyse generalization error bound of variational lower bound estimates of MI. In the proposed work, we show that it is possible to negate the high variance characteristics of these estimators by constraining the critic's hypothesis space to Reproducing Hilbert Kernel Space (RKHS), which corresponds to a kernel learned using Automated Spectral Kernel Learning (ASKL). By analysing the aforementioned generalization error bounds, we augment the overall optimisation objective with effective regularisation term. We empirically demonstrate the efficacy of this regularization in enforcing proper bias variance tradeoff on four variational lower bounds, namely NWJ, MINE, JS and SMILE.

Related papers

Semiparametric conformal prediction [79.6147286161434]
We construct a conformal prediction set accounting for the joint correlation structure of the vector-valued non-conformity scores. We flexibly estimate the joint cumulative distribution function (CDF) of the scores. Our method yields desired coverage and competitive efficiency on a range of real-world regression problems.
arXiv Detail & Related papers (2024-11-04T14:29:02Z)
Multivariate root-n-consistent smoothing parameter free matching estimators and estimators of inverse density weighted expectations [51.000851088730684]
We develop novel modifications of nearest-neighbor and matching estimators which converge at the parametric $sqrt n $-rate. We stress that our estimators do not involve nonparametric function estimators and in particular do not rely on sample-size dependent parameters smoothing.
arXiv Detail & Related papers (2024-07-11T13:28:34Z)
A Uniform Concentration Inequality for Kernel-Based Two-Sample Statistics [4.757470449749877]
We show that these metrics can be unified under a general framework of kernel-based two-sample statistics. This paper establishes a novel uniform concentration inequality for the aforementioned kernel-based statistics. As illustrative applications, we demonstrate how these bounds facilitate the component of error bounds for procedures such as distance covariance-based dimension reduction.
arXiv Detail & Related papers (2024-05-22T22:41:56Z)
Estimation of mutual information via quantum kernel method [0.0]
Estimating mutual information (MI) plays a critical role to investigate the relationship among multiple random variables with a nonlinear correlation. We propose a method for estimating mutual information using the quantum kernel.
arXiv Detail & Related papers (2023-10-19T00:53:16Z)
Adaptive learning of density ratios in RKHS [3.047411947074805]
Estimating the ratio of two probability densities from finitely many observations is a central problem in machine learning and statistics. We analyze a large class of density ratio estimation methods that minimize a regularized Bregman divergence between the true density ratio and a model in a reproducing kernel Hilbert space.
arXiv Detail & Related papers (2023-07-30T08:18:39Z)
Equivariance Discovery by Learned Parameter-Sharing [153.41877129746223]
We study how to discover interpretable equivariances from data. Specifically, we formulate this discovery process as an optimization problem over a model's parameter-sharing schemes. Also, we theoretically analyze the method for Gaussian data and provide a bound on the mean squared gap between the studied discovery scheme and the oracle scheme.
arXiv Detail & Related papers (2022-04-07T17:59:19Z)
Keep it Tighter -- A Story on Analytical Mean Embeddings [0.6445605125467574]
Kernel techniques are among the most popular and flexible approaches in data science. Mean embedding gives rise to a divergence measure referred to as maximum mean discrepancy (MMD) In this paper we focus on the problem of MMD estimation when the mean embedding of one of the underlying distributions is available analytically.
arXiv Detail & Related papers (2021-10-15T21:29:27Z)
Neural Estimation of Statistical Divergences [24.78742908726579]
A modern method for estimating statistical divergences relies on parametrizing an empirical variational form by a neural network (NN) In particular, there is a fundamental tradeoff between the two sources of error involved: approximation and empirical estimation. We show that neural estimators with a slightly different NN growth-rate are near minimax rate-optimal, achieving the parametric convergence rate up to logarithmic factors.
arXiv Detail & Related papers (2021-10-07T17:42:44Z)
Statistical inference using Regularized M-estimation in the reproducing kernel Hilbert space for handling missing data [0.76146285961466]
We first use the kernel ridge regression to develop imputation for handling item nonresponse. A nonparametric propensity score estimator using the kernel Hilbert space is also developed. The proposed method is applied to analyze the air pollution data measured in Beijing, China.
arXiv Detail & Related papers (2021-07-15T14:51:39Z)
Tight Mutual Information Estimation With Contrastive Fenchel-Legendre Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO. Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently. The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z)
Bayesian Uncertainty Estimation of Learned Variational MRI Reconstruction [63.202627467245584]
We introduce a Bayesian variational framework to quantify the model-immanent (epistemic) uncertainty. We demonstrate that our approach yields competitive results for undersampled MRI reconstruction.
arXiv Detail & Related papers (2021-02-12T18:08:14Z)
Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur. We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z)
Minimax Optimal Estimation of KL Divergence for Continuous Distributions [56.29748742084386]
Esting Kullback-Leibler divergence from identical and independently distributed samples is an important problem in various domains. One simple and effective estimator is based on the k nearest neighbor between these samples.
arXiv Detail & Related papers (2020-02-26T16:37:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.