Decomposed Mutual Information Estimation for Contrastive Representation
Learning
- URL: http://arxiv.org/abs/2106.13401v1
- Date: Fri, 25 Jun 2021 03:19:25 GMT
- Title: Decomposed Mutual Information Estimation for Contrastive Representation
Learning
- Authors: Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Phil
Bachman, Remi Tachet
- Abstract summary: Recent contrastive representation learning methods rely on estimating mutual information (MI) between multiple views of an underlying context.
We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews.
This expression contains a sum of unconditional and conditional MI terms, each measuring modest chunks of the total MI.
We show that DEMI can capture a larger amount of MI than standard non-decomposed contrastive bounds in a synthetic setting.
- Score: 66.52795579973484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent contrastive representation learning methods rely on estimating mutual
information (MI) between multiple views of an underlying context. E.g., we can
derive multiple views of a given image by applying data augmentation, or we can
split a sequence into views comprising the past and future of some step in the
sequence. Contrastive lower bounds on MI are easy to optimize, but have a
strong underestimation bias when estimating large amounts of MI. We propose
decomposing the full MI estimation problem into a sum of smaller estimation
problems by splitting one of the views into progressively more informed
subviews and by applying the chain rule on MI between the decomposed views.
This expression contains a sum of unconditional and conditional MI terms, each
measuring modest chunks of the total MI, which facilitates approximation via
contrastive bounds. To maximize the sum, we formulate a contrastive lower bound
on the conditional MI which can be approximated efficiently. We refer to our
general approach as Decomposed Estimation of Mutual Information (DEMI). We show
that DEMI can capture a larger amount of MI than standard non-decomposed
contrastive bounds in a synthetic setting, and learns better representations in
a vision domain and for dialogue generation.
Related papers
- Constrained Multiview Representation for Self-supervised Contrastive
Learning [4.817827522417457]
We introduce a novel approach predicated on representation distance-based mutual information (MI) for measuring the significance of different views.
We harness multi-view representations extracted from the frequency domain, re-evaluating their significance based on mutual information.
arXiv Detail & Related papers (2024-02-05T19:09:33Z) - Understanding Probe Behaviors through Variational Bounds of Mutual
Information [53.520525292756005]
We provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory.
First, we connect probing with the variational bounds of mutual information (MI) to relax the probe design, equating linear probing with fine-tuning.
We show that the intermediate representations can have the biggest MI estimate because of the tradeoff between better separability and decreasing MI.
arXiv Detail & Related papers (2023-12-15T18:38:18Z) - Exploiting Pseudo Image Captions for Multimodal Summarization [26.033681302592207]
Cross-modal contrastive learning in vision language pretraining faces the challenge of (partial) false negatives.
We propose a contrastive learning strategy regulated by progressively refined cross-modal similarity, to more accurately optimize MI between an image/text anchor and its negative texts/images.
arXiv Detail & Related papers (2023-05-09T14:47:25Z) - Improving Mutual Information Estimation with Annealed and Energy-Based
Bounds [20.940022170594816]
Mutual information (MI) is a fundamental quantity in information theory and machine learning.
We present a unifying view of existing MI bounds from the perspective of importance sampling.
We propose three novel bounds based on this approach.
arXiv Detail & Related papers (2023-03-13T10:47:24Z) - Learning Multimodal VAEs through Mutual Supervision [72.77685889312889]
MEME combines information between modalities implicitly through mutual supervision.
We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes.
arXiv Detail & Related papers (2021-06-23T17:54:35Z) - CLUB: A Contrastive Log-ratio Upper Bound of Mutual Information [105.73798100327667]
We propose a novel Contrastive Log-ratio Upper Bound (CLUB) of mutual information.
We provide a theoretical analysis of the properties of CLUB and its variational approximation.
Based on this upper bound, we introduce a MI minimization training scheme and further accelerate it with a negative sampling strategy.
arXiv Detail & Related papers (2020-06-22T05:36:16Z) - What Makes for Good Views for Contrastive Learning? [90.49736973404046]
We argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact.
We devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
As a by-product, we achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification.
arXiv Detail & Related papers (2020-05-20T17:59:57Z) - Mutual Information Gradient Estimation for Representation Learning [56.08429809658762]
Mutual Information (MI) plays an important role in representation learning.
Recent advances establish tractable and scalable MI estimators to discover useful representation.
We propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions.
arXiv Detail & Related papers (2020-05-03T16:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.