Related papers: InfoNCE is variational inference in a recognition parameterised model

InfoNCE is variational inference in a recognition parameterised model

URL: http://arxiv.org/abs/2107.02495v3
Date: Thu, 10 Aug 2023 08:16:52 GMT
Title: InfoNCE is variational inference in a recognition parameterised model
Authors: Laurence Aitchison and Stoil Ganev
Abstract summary: We show that the InfoNCE objective is equivalent to the ELBO in a new class of probabilistic generative model. In particular, we show that in the infinite sample limit, and for a particular choice of prior, the actual InfoNCE objective is equal to the ELBO.
Score: 32.45282187405337
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Here, we show that the InfoNCE objective is equivalent to the ELBO in a new class of probabilistic generative model, the recognition parameterised model (RPM). When we learn the optimal prior, the RPM ELBO becomes equal to the mutual information (MI; up to a constant), establishing a connection to pre-existing self-supervised learning methods such as InfoNCE. However, practical InfoNCE methods do not use the MI as an objective; the MI is invariant to arbitrary invertible transformations, so using an MI objective can lead to highly entangled representations (Tschannen et al., 2019). Instead, the actual InfoNCE objective is a simplified lower bound on the MI which is loose even in the infinite sample limit. Thus, an objective that works (i.e. the actual InfoNCE objective) appears to be motivated as a loose bound on an objective that does not work (i.e. the true MI which gives arbitrarily entangled representations). We give an alternative motivation for the actual InfoNCE objective. In particular, we show that in the infinite sample limit, and for a particular choice of prior, the actual InfoNCE objective is equal to the ELBO (up to a constant); and the ELBO is equal to the marginal likelihood with a deterministic recognition model. Thus, we argue that our VAE perspective gives a better motivation for InfoNCE than MI, as the actual InfoNCE objective is only loosely bounded by the MI, but is equal to the ELBO/marginal likelihood (up to a constant).

Related papers

Contrastive Predictive Coding Done Right for Mutual Information Estimation [21.046609494716865]
We show why InfoNCE should not be regarded as a valid MI estimator.<n>We introduce a simple modification, which we refer to as InfoNCE-anchor, for accurate MI estimation.<n>We generalize our framework using proper scoring rules, which recover InfoNCE-anchor as a special case when the log score is employed.
arXiv Detail & Related papers (2025-10-29T21:33:59Z)
Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking [78.69179041551014]
We propose an information-theoretic reward modeling framework based on the Information Bottleneck principle.<n>We show that InfoRM filters out preference-irrelevant information to alleviate reward misgeneralization.<n>We also introduce IBL, a distribution-level regularization that penalizes such deviations, effectively expanding the optimization landscape.
arXiv Detail & Related papers (2025-10-15T15:51:59Z)
Evaluating Membership Inference Attacks and Defenses in Federated Learning [23.080346952364884]
Membership Inference Attacks (MIAs) pose a growing threat to privacy preservation in federated learning. This paper conducts an evaluation of existing MIAs and corresponding defense strategies.
arXiv Detail & Related papers (2024-02-09T09:58:35Z)
Improving Mutual Information Estimation with Annealed and Energy-Based Bounds [20.940022170594816]
Mutual information (MI) is a fundamental quantity in information theory and machine learning. We present a unifying view of existing MI bounds from the perspective of importance sampling. We propose three novel bounds based on this approach.
arXiv Detail & Related papers (2023-03-13T10:47:24Z)
How to select an objective function using information theory [0.0]
In machine learning or scientific computing, model performance is measured with an objective function. Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty) as opposed to any specific utility. We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change.
arXiv Detail & Related papers (2022-12-10T04:05:54Z)
Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks. Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z)
How Does Data Augmentation Affect Privacy in Machine Learning? [94.52721115660626]
We propose new MI attacks to utilize the information of augmented data. We establish the optimal membership inference when the model is trained with augmented data.
arXiv Detail & Related papers (2020-07-21T02:21:10Z)
Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur. We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z)
VMI-VAE: Variational Mutual Information Maximization Framework for VAE With Discrete and Continuous Priors [5.317548969642376]
Variational Autoencoder is a scalable method for learning latent variable models of complex data. We propose a Variational Mutual Information Maximization Framework for VAE to address this issue.
arXiv Detail & Related papers (2020-05-28T12:44:23Z)
What Makes for Good Views for Contrastive Learning? [90.49736973404046]
We argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact. We devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI. As a by-product, we achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification.
arXiv Detail & Related papers (2020-05-20T17:59:57Z)
Mutual Information Gradient Estimation for Representation Learning [56.08429809658762]
Mutual Information (MI) plays an important role in representation learning. Recent advances establish tractable and scalable MI estimators to discover useful representation. We propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions.
arXiv Detail & Related papers (2020-05-03T16:05:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.