Contrastive Predictive Coding Done Right for Mutual Information Estimation
- URL: http://arxiv.org/abs/2510.25983v1
- Date: Wed, 29 Oct 2025 21:33:59 GMT
- Title: Contrastive Predictive Coding Done Right for Mutual Information Estimation
- Authors: J. Jon Ryu, Pavan Yeddanapudi, Xiangxiang Xu, Gregory W. Wornell,
- Abstract summary: We show why InfoNCE should not be regarded as a valid MI estimator.<n>We introduce a simple modification, which we refer to as InfoNCE-anchor, for accurate MI estimation.<n>We generalize our framework using proper scoring rules, which recover InfoNCE-anchor as a special case when the log score is employed.
- Score: 21.046609494716865
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The InfoNCE objective, originally introduced for contrastive representation learning, has become a popular choice for mutual information (MI) estimation, despite its indirect connection to MI. In this paper, we demonstrate why InfoNCE should not be regarded as a valid MI estimator, and we introduce a simple modification, which we refer to as InfoNCE-anchor, for accurate MI estimation. Our modification introduces an auxiliary anchor class, enabling consistent density ratio estimation and yielding a plug-in MI estimator with significantly reduced bias. Beyond this, we generalize our framework using proper scoring rules, which recover InfoNCE-anchor as a special case when the log score is employed. This formulation unifies a broad spectrum of contrastive objectives, including NCE, InfoNCE, and $f$-divergence variants, under a single principled framework. Empirically, we find that InfoNCE-anchor with the log score achieves the most accurate MI estimates; however, in self-supervised representation learning experiments, we find that the anchor does not improve the downstream task performance. These findings corroborate that contrastive representation learning benefits not from accurate MI estimation per se, but from the learning of structured density ratios.
Related papers
- Observationally Informed Adaptive Causal Experimental Design [55.998153710215654]
We propose Active Residual Learning, a new paradigm that leverages the observational model as a foundational prior.<n>This approach shifts the experimental focus from learning target causal quantities from scratch to efficiently estimating the residuals required to correct observational bias.<n> Experiments on synthetic and semi-synthetic benchmarks demonstrate that R-Design significantly outperforms baselines.
arXiv Detail & Related papers (2026-03-04T06:52:37Z) - RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training [59.493415006017635]
Pre-trained Multi-modal Large Language Models (MLLMs) provide a knowledge-rich foundation for post-training.<n>Current evaluation relies on testing after supervised fine-tuning, which introduces laborious additional training and autoregressive decoding costs.<n>We propose RADAR, an efficient ability-centric evaluation framework for Revealing Asymmetric Development of Abilities in MLLM pRe-training.
arXiv Detail & Related papers (2026-02-13T12:56:31Z) - FMMI: Flow Matching Mutual Information Estimation [43.51440237740181]
We introduce a novel Mutual Information (MI) estimator that fundamentally reframes the discriminative approach.<n>Instead of training a classifier to discriminate between joint and marginal distributions, we learn a normalizing flow that transforms one into the other.
arXiv Detail & Related papers (2025-11-11T18:34:33Z) - Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach [11.609354498110358]
Machine unlearning seeks to remove the influence of specified data from a trained model.<n>In this paper, we find that the data misclassified across UA and MIA still have their ground truth labels included in the prediction set.<n>We propose two novel metrics inspired by conformal prediction that more reliably evaluate forgetting quality.
arXiv Detail & Related papers (2025-01-31T18:58:43Z) - An Information Theoretic Evaluation Metric For Strong Unlearning [20.143627174765985]
We introduce the Information Difference Index (IDI), a novel white-box metric inspired by information theory.
IDI quantifies retained information in intermediate features by measuring mutual information between those features and the labels to be forgotten.
Our experiments demonstrate that IDI effectively measures the degree of unlearning across various datasets and architectures.
arXiv Detail & Related papers (2024-05-28T06:57:01Z) - Improving importance estimation in covariate shift for providing
accurate prediction error [0.0]
The Kullback-Leibler Importance Estimation Procedure (KLIEP) is capable of estimating importance in a promising way.
This paper explores the potential performance improvement if target information is considered in the computation of the importance.
arXiv Detail & Related papers (2024-02-02T14:39:39Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - InfoNCE is variational inference in a recognition parameterised model [32.45282187405337]
We show that the InfoNCE objective is equivalent to the ELBO in a new class of probabilistic generative model.
In particular, we show that in the infinite sample limit, and for a particular choice of prior, the actual InfoNCE objective is equal to the ELBO.
arXiv Detail & Related papers (2021-07-06T09:24:57Z) - Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO.
Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently.
The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z) - Learning Calibrated Uncertainties for Domain Shift: A Distributionally
Robust Learning Approach [150.8920602230832]
We propose a framework for learning calibrated uncertainties under domain shifts.
In particular, the density ratio estimation reflects the closeness of a target (test) sample to the source (training) distribution.
We show that our proposed method generates calibrated uncertainties that benefit downstream tasks.
arXiv Detail & Related papers (2020-10-08T02:10:54Z) - What Makes for Good Views for Contrastive Learning? [90.49736973404046]
We argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact.
We devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
As a by-product, we achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification.
arXiv Detail & Related papers (2020-05-20T17:59:57Z) - Mutual Information Gradient Estimation for Representation Learning [56.08429809658762]
Mutual Information (MI) plays an important role in representation learning.
Recent advances establish tractable and scalable MI estimators to discover useful representation.
We propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions.
arXiv Detail & Related papers (2020-05-03T16:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.