Improving Mutual Information Estimation with Annealed and Energy-Based
Bounds
- URL: http://arxiv.org/abs/2303.06992v1
- Date: Mon, 13 Mar 2023 10:47:24 GMT
- Title: Improving Mutual Information Estimation with Annealed and Energy-Based
Bounds
- Authors: Rob Brekelmans, Sicong Huang, Marzyeh Ghassemi, Greg Ver Steeg, Roger
Grosse, Alireza Makhzani
- Abstract summary: Mutual information (MI) is a fundamental quantity in information theory and machine learning.
We present a unifying view of existing MI bounds from the perspective of importance sampling.
We propose three novel bounds based on this approach.
- Score: 20.940022170594816
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mutual information (MI) is a fundamental quantity in information theory and
machine learning. However, direct estimation of MI is intractable, even if the
true joint probability density for the variables of interest is known, as it
involves estimating a potentially high-dimensional log partition function. In
this work, we present a unifying view of existing MI bounds from the
perspective of importance sampling, and propose three novel bounds based on
this approach. Since accurate estimation of MI without density information
requires a sample size exponential in the true MI, we assume either a single
marginal or the full joint density information is known. In settings where the
full joint density is available, we propose Multi-Sample Annealed Importance
Sampling (AIS) bounds on MI, which we demonstrate can tightly estimate large
values of MI in our experiments. In settings where only a single marginal
distribution is known, we propose Generalized IWAE (GIWAE) and MINE-AIS bounds.
Our GIWAE bound unifies variational and contrastive bounds in a single
framework that generalizes InfoNCE, IWAE, and Barber-Agakov bounds. Our
MINE-AIS method improves upon existing energy-based methods such as MINE-DV and
MINE-F by directly optimizing a tighter lower bound on MI. MINE-AIS uses MCMC
sampling to estimate gradients for training and Multi-Sample AIS for evaluating
the bound. Our methods are particularly suitable for evaluating MI in deep
generative models, since explicit forms of the marginal or joint densities are
often available. We evaluate our bounds on estimating the MI of VAEs and GANs
trained on the MNIST and CIFAR datasets, and showcase significant gains over
existing bounds in these challenging settings with high ground truth MI.
Related papers
- Detecting Training Data of Large Language Models via Expectation Maximization [62.28028046993391]
Membership inference attacks (MIAs) aim to determine whether a specific instance was part of a target model's training data.
Applying MIAs to large language models (LLMs) presents unique challenges due to the massive scale of pre-training data and the ambiguous nature of membership.
We introduce EM-MIA, a novel MIA method for LLMs that iteratively refines membership scores and prefix scores via an expectation-maximization algorithm.
arXiv Detail & Related papers (2024-10-10T03:31:16Z) - Mutual Information Estimation via Normalizing Flows [39.58317527488534]
We propose a novel approach to the problem of mutual information estimation.
The estimator maps original data to the target distribution, for which MI is easier to estimate.
We additionally explore the target distributions with known closed-form expressions for MI.
arXiv Detail & Related papers (2024-03-04T16:28:04Z) - A robust estimator of mutual information for deep learning
interpretability [2.574652392763709]
We present GMM-MI, an algorithm that can be applied to both discrete and continuous settings.
We extensively validate GMM-MI on toy data for which the ground truth MI is known.
We then demonstrate the use of our MI estimator in the context of representation learning.
arXiv Detail & Related papers (2022-10-31T18:00:02Z) - Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO.
Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently.
The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z) - Decomposed Mutual Information Estimation for Contrastive Representation
Learning [66.52795579973484]
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context.
We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews.
This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI.
We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting.
arXiv Detail & Related papers (2021-06-25T03:19:25Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - Mutual Information Gradient Estimation for Representation Learning [56.08429809658762]
Mutual Information (MI) plays an important role in representation learning.
Recent advances establish tractable and scalable MI estimators to discover useful representation.
We propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions.
arXiv Detail & Related papers (2020-05-03T16:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.