InfoNCE is variational inference in a recognition parameterised model
- URL: http://arxiv.org/abs/2107.02495v3
- Date: Thu, 10 Aug 2023 08:16:52 GMT
- Title: InfoNCE is variational inference in a recognition parameterised model
- Authors: Laurence Aitchison and Stoil Ganev
- Abstract summary: We show that the InfoNCE objective is equivalent to the ELBO in a new class of probabilistic generative model.
In particular, we show that in the infinite sample limit, and for a particular choice of prior, the actual InfoNCE objective is equal to the ELBO.
- Score: 32.45282187405337
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Here, we show that the InfoNCE objective is equivalent to the ELBO in a new
class of probabilistic generative model, the recognition parameterised model
(RPM). When we learn the optimal prior, the RPM ELBO becomes equal to the
mutual information (MI; up to a constant), establishing a connection to
pre-existing self-supervised learning methods such as InfoNCE. However,
practical InfoNCE methods do not use the MI as an objective; the MI is
invariant to arbitrary invertible transformations, so using an MI objective can
lead to highly entangled representations (Tschannen et al., 2019). Instead, the
actual InfoNCE objective is a simplified lower bound on the MI which is loose
even in the infinite sample limit. Thus, an objective that works (i.e. the
actual InfoNCE objective) appears to be motivated as a loose bound on an
objective that does not work (i.e. the true MI which gives arbitrarily
entangled representations). We give an alternative motivation for the actual
InfoNCE objective. In particular, we show that in the infinite sample limit,
and for a particular choice of prior, the actual InfoNCE objective is equal to
the ELBO (up to a constant); and the ELBO is equal to the marginal likelihood
with a deterministic recognition model. Thus, we argue that our VAE perspective
gives a better motivation for InfoNCE than MI, as the actual InfoNCE objective
is only loosely bounded by the MI, but is equal to the ELBO/marginal likelihood
(up to a constant).
Related papers
- Contrastive Predictive Coding Done Right for Mutual Information Estimation [21.046609494716865]
We show why InfoNCE should not be regarded as a valid MI estimator.<n>We introduce a simple modification, which we refer to as InfoNCE-anchor, for accurate MI estimation.<n>We generalize our framework using proper scoring rules, which recover InfoNCE-anchor as a special case when the log score is employed.
arXiv Detail & Related papers (2025-10-29T21:33:59Z) - Information-Theoretic Reward Modeling for Stable RLHF: Detecting and Mitigating Reward Hacking [78.69179041551014]
We propose an information-theoretic reward modeling framework based on the Information Bottleneck principle.<n>We show that InfoRM filters out preference-irrelevant information to alleviate reward misgeneralization.<n>We also introduce IBL, a distribution-level regularization that penalizes such deviations, effectively expanding the optimization landscape.
arXiv Detail & Related papers (2025-10-15T15:51:59Z) - Evaluating Membership Inference Attacks and Defenses in Federated
Learning [23.080346952364884]
Membership Inference Attacks (MIAs) pose a growing threat to privacy preservation in federated learning.
This paper conducts an evaluation of existing MIAs and corresponding defense strategies.
arXiv Detail & Related papers (2024-02-09T09:58:35Z) - Improving Mutual Information Estimation with Annealed and Energy-Based
Bounds [20.940022170594816]
Mutual information (MI) is a fundamental quantity in information theory and machine learning.
We present a unifying view of existing MI bounds from the perspective of importance sampling.
We propose three novel bounds based on this approach.
arXiv Detail & Related papers (2023-03-13T10:47:24Z) - How to select an objective function using information theory [0.0]
In machine learning or scientific computing, model performance is measured with an objective function.
Under the information-theoretic paradigm, the ultimate objective is to maximize information (and minimize uncertainty) as opposed to any specific utility.
We argue that this paradigm is well-suited to models that have many uses and no definite utility, like the large Earth system models used to understand the effects of climate change.
arXiv Detail & Related papers (2022-12-10T04:05:54Z) - Adversarial Intrinsic Motivation for Reinforcement Learning [60.322878138199364]
We investigate whether the Wasserstein-1 distance between a policy's state visitation distribution and a target distribution can be utilized effectively for reinforcement learning tasks.
Our approach, termed Adversarial Intrinsic Motivation (AIM), estimates this Wasserstein-1 distance through its dual objective and uses it to compute a supplemental reward function.
arXiv Detail & Related papers (2021-05-27T17:51:34Z) - How Does Data Augmentation Affect Privacy in Machine Learning? [94.52721115660626]
We propose new MI attacks to utilize the information of augmented data.
We establish the optimal membership inference when the model is trained with augmented data.
arXiv Detail & Related papers (2020-07-21T02:21:10Z) - Neural Methods for Point-wise Dependency Estimation [129.93860669802046]
We focus on estimating point-wise dependency (PD), which quantitatively measures how likely two outcomes co-occur.
We demonstrate the effectiveness of our approaches in 1) MI estimation, 2) self-supervised representation learning, and 3) cross-modal retrieval task.
arXiv Detail & Related papers (2020-06-09T23:26:15Z) - VMI-VAE: Variational Mutual Information Maximization Framework for VAE
With Discrete and Continuous Priors [5.317548969642376]
Variational Autoencoder is a scalable method for learning latent variable models of complex data.
We propose a Variational Mutual Information Maximization Framework for VAE to address this issue.
arXiv Detail & Related papers (2020-05-28T12:44:23Z) - What Makes for Good Views for Contrastive Learning? [90.49736973404046]
We argue that we should reduce the mutual information (MI) between views while keeping task-relevant information intact.
We devise unsupervised and semi-supervised frameworks that learn effective views by aiming to reduce their MI.
As a by-product, we achieve a new state-of-the-art accuracy on unsupervised pre-training for ImageNet classification.
arXiv Detail & Related papers (2020-05-20T17:59:57Z) - Mutual Information Gradient Estimation for Representation Learning [56.08429809658762]
Mutual Information (MI) plays an important role in representation learning.
Recent advances establish tractable and scalable MI estimators to discover useful representation.
We propose the Mutual Information Gradient Estimator (MIGE) for representation learning based on the score estimation of implicit distributions.
arXiv Detail & Related papers (2020-05-03T16:05:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.