$f$-MICL: Understanding and Generalizing InfoNCE-based Contrastive
Learning
- URL: http://arxiv.org/abs/2402.10150v1
- Date: Thu, 15 Feb 2024 17:57:54 GMT
- Title: $f$-MICL: Understanding and Generalizing InfoNCE-based Contrastive
Learning
- Authors: Yiwei Lu,Guojun Zhang,Sun Sun,Hongyu Guo,Yaoliang Yu
- Abstract summary: In contrastive learning, a widely-adopted objective function is InfoNCE, which uses the Gaussian cosine similarity for the representation comparison.
In this paper, we aim at answering two intriguing questions: (1) Can we go beyond the KL-based objective?
We provide answers by generalizing the KL-based mutual information to the $f$-Mutual Information in Contrastive Learning ($f$-MICL) using the $f$-divergences.
- Score: 37.45319637345343
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In self-supervised contrastive learning, a widely-adopted objective function
is InfoNCE, which uses the heuristic cosine similarity for the representation
comparison, and is closely related to maximizing the Kullback-Leibler
(KL)-based mutual information. In this paper, we aim at answering two
intriguing questions: (1) Can we go beyond the KL-based objective? (2) Besides
the popular cosine similarity, can we design a better similarity function? We
provide answers to both questions by generalizing the KL-based mutual
information to the $f$-Mutual Information in Contrastive Learning ($f$-MICL)
using the $f$-divergences. To answer the first question, we provide a wide
range of $f$-MICL objectives which share the nice properties of InfoNCE (e.g.,
alignment and uniformity), and meanwhile result in similar or even superior
performance. For the second question, assuming that the joint feature
distribution is proportional to the Gaussian kernel, we derive an $f$-Gaussian
similarity with better interpretability and empirical performance. Finally, we
identify close relationships between the $f$-MICL objective and several popular
InfoNCE-based objectives. Using benchmark tasks from both vision and natural
language, we empirically evaluate $f$-MICL with different $f$-divergences on
various architectures (SimCLR, MoCo, and MoCo v3) and datasets. We observe that
$f$-MICL generally outperforms the benchmarks and the best-performing
$f$-divergence is task and dataset dependent.
Related papers
- The All-Seeing Project V2: Towards General Relation Comprehension of the Open World [58.40101895719467]
We present the All-Seeing Project V2, a new model and dataset designed for understanding object relations in images.
We propose the All-Seeing Model V2 that integrates the formulation of text generation, object localization, and relation comprehension into a relation conversation task.
Our model excels not only in perceiving and recognizing all objects within the image but also in grasping the intricate relation graph between them.
arXiv Detail & Related papers (2024-02-29T18:59:17Z) - Separating common from salient patterns with Contrastive Representation
Learning [2.250968907999846]
Contrastive Analysis aims at separating common factors of variation between two datasets.
Current models based on Variational Auto-Encoders have shown poor performance in learning semantically-expressive representations.
We propose to leverage the ability of Contrastive Learning to learn semantically expressive representations well adapted for Contrastive Analysis.
arXiv Detail & Related papers (2024-02-19T08:17:13Z) - A duality framework for analyzing random feature and two-layer neural networks [7.400520323325074]
We consider the problem of learning functions within the $mathcalF_p,pi$ and Barron spaces.
We establish a dual equivalence between approximation and estimation, and then apply it to study the learning of the preceding function spaces.
arXiv Detail & Related papers (2023-05-09T17:41:50Z) - Contrastive Learning Is Spectral Clustering On Similarity Graph [12.47963220169677]
We show that contrastive learning with the standard InfoNCE loss is equivalent to spectral clustering on the similarity graph.
Motivated by our theoretical insights, we introduce the Kernel-InfoNCE loss.
arXiv Detail & Related papers (2023-03-27T11:13:35Z) - Principled Reinforcement Learning with Human Feedback from Pairwise or
$K$-wise Comparisons [79.98542868281473]
We provide a theoretical framework for Reinforcement Learning with Human Feedback (RLHF)
We show that when training a policy based on the learned reward model, MLE fails while a pessimistic MLE provides policies with improved performance under certain coverage assumptions.
arXiv Detail & Related papers (2023-01-26T18:07:21Z) - Multi-Task Imitation Learning for Linear Dynamical Systems [50.124394757116605]
We study representation learning for efficient imitation learning over linear systems.
We find that the imitation gap over trajectories generated by the learned target policy is bounded by $tildeOleft( frack n_xHN_mathrmshared + frack n_uN_mathrmtargetright)$.
arXiv Detail & Related papers (2022-12-01T00:14:35Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - A Self-Penalizing Objective Function for Scalable Interaction Detection [2.208242292882514]
We tackle the problem of nonparametric variable selection with a computation on discovering interactions between variables.
The trick is to maximize parametrized nonparametric dependence measures which we call metric learning objectives.
arXiv Detail & Related papers (2020-11-24T17:07:49Z) - Variational Mutual Information Maximization Framework for VAE Latent
Codes with Continuous and Discrete Priors [5.317548969642376]
Variational Autoencoder (VAE) is a scalable method for learning directed latent variable models of complex data.
We propose Variational Mutual Information Maximization Framework for VAE to address this issue.
arXiv Detail & Related papers (2020-06-02T09:05:51Z) - Memory-Augmented Relation Network for Few-Shot Learning [114.47866281436829]
In this work, we investigate a new metric-learning method, Memory-Augmented Relation Network (MRN)
In MRN, we choose the samples that are visually similar from the working context, and perform weighted information propagation to attentively aggregate helpful information from chosen ones to enhance its representation.
We empirically demonstrate that MRN yields significant improvement over its ancestor and achieves competitive or even better performance when compared with other few-shot learning approaches.
arXiv Detail & Related papers (2020-05-09T10:09:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.