Mutual Information Gradient Estimation for Representation Learning
- URL: http://arxiv.org/abs/2005.01123v1
- Date: Sun, 3 May 2020 16:05:58 GMT
- Title: Mutual Information Gradient Estimation for Representation Learning
- Authors: Liangjian Wen, Yiji Zhou, Lirong He, Mingyuan Zhou, Zenglin Xu
- Abstract summary: Mutual Information (MI) plays an important role in representation learning.
Recent advances establish tractable and scalable MI estimators to discover useful representation.
We propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions.
- Score: 56.08429809658762
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mutual Information (MI) plays an important role in representation learning.
However, MI is unfortunately intractable in continuous and high-dimensional
settings. Recent advances establish tractable and scalable MI estimators to
discover useful representation. However, most of the existing methods are not
capable of providing an accurate estimation of MI with low-variance when the MI
is large. We argue that directly estimating the gradients of MI is more
appealing for representation learning than estimating MI in itself. To this
end, we propose the Mutual Information Gradient Estimator (MIGE) for
representation learning based on the score estimation of implicit
distributions. MIGE exhibits a tight and smooth gradient estimation of MI in
the high-dimensional and large-MI settings. We expand the applications of MIGE
in both unsupervised learning of deep representations based on InfoMax and the
Information Bottleneck method. Experimental results have indicated significant
performance improvement in learning useful representation.
Related papers
- Understanding Probe Behaviors through Variational Bounds of Mutual
Information [53.520525292756005]
We provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory.
First, we connect probing with the variational bounds of mutual information (MI) to relax the probe design, equating linear probing with fine-tuning.
We show that the intermediate representations can have the biggest MI estimate because of the tradeoff between better separability and decreasing MI.
arXiv Detail & Related papers (2023-12-15T18:38:18Z) - Improving Mutual Information Estimation with Annealed and Energy-Based
Bounds [20.940022170594816]
Mutual information (MI) is a fundamental quantity in information theory and machine learning.
We present a unifying view of existing MI bounds from the perspective of importance sampling.
We propose three novel bounds based on this approach.
arXiv Detail & Related papers (2023-03-13T10:47:24Z) - Decomposed Mutual Information Estimation for Contrastive Representation
Learning [66.52795579973484]
Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context.
We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews.
This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI.
We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting.
arXiv Detail & Related papers (2021-06-25T03:19:25Z) - Are deep learning models superior for missing data imputation in large
surveys? Evidence from an empirical comparison [5.994312110645453]
Multiple imputation (MI) is the state-of-the-art approach for dealing with missing data arising from non-response in sample surveys.
Recent MI methods based on deep learning models have been developed with encouraging results in small studies.
This paper provides a framework for using simulations based on real survey data and several performance metrics to compare MI methods.
arXiv Detail & Related papers (2021-03-14T16:24:04Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - What Makes for Good Views for Contrastive Learning? [90.49736973404046]
We argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact.
We devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
As a by-product, we achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification.
arXiv Detail & Related papers (2020-05-20T17:59:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.