Related papers: Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses

Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses

URL: http://arxiv.org/abs/2405.18045v1
Date: Tue, 28 May 2024 11:00:41 GMT
Title: Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses
Authors: Panagiotis Koromilas, Giorgos Bouritsas, Theodoros Giannakopoulos, Mihalis Nicolaou, Yannis Panagakis,
Abstract summary: We show that different contrastive learning (CL) losses actually optimize for? We introduce a novel CL objective Decoupled Hyperspherical Energy Loss (DHEL) We show the same results hold for another relevant CL family, namely kernel contrastive learning (KCL), with the additional advantage of the expected loss being independent of batch size.
Score: 20.273126099815517
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: What do different contrastive learning (CL) losses actually optimize for? Although multiple CL methods have demonstrated remarkable representation learning capabilities, the differences in their inner workings remain largely opaque. In this work, we analyse several CL families and prove that, under certain conditions, they admit the same minimisers when optimizing either their batch-level objectives or their expectations asymptotically. In both cases, an intimate connection with the hyperspherical energy minimisation (HEM) problem resurfaces. Drawing inspiration from this, we introduce a novel CL objective, coined Decoupled Hyperspherical Energy Loss (DHEL). DHEL simplifies the problem by decoupling the target hyperspherical energy from the alignment of positive examples while preserving the same theoretical guarantees. Going one step further, we show the same results hold for another relevant CL family, namely kernel contrastive learning (KCL), with the additional advantage of the expected loss being independent of batch size, thus identifying the minimisers in the non-asymptotic regime. Empirical results demonstrate improved downstream performance and robustness across combinations of different batch sizes and hyperparameters and reduced dimensionality collapse, on several computer vision datasets.

Related papers

Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning [48.11265601808718]
We show that standard self-supervised contrastive learning objectives implicitly approximate a supervised variant we call the negatives-only supervised contrastive loss (NSCL)<n>We prove that the gap between the CL and NSCL losses vanishes as the number of semantic classes increases, under a bound that is both label-agnostic and architecture-independent.
arXiv Detail & Related papers (2025-06-04T19:43:36Z)
Negative-Free Self-Supervised Gaussian Embedding of Graphs [29.26519601854811]
Graph Contrastive Learning (GCL) has emerged as a promising graph self-supervised learning framework. We propose a negative-free objective to achieve uniformity, inspired by the fact that points distributed according to a normalized isotropic Gaussian are uniformly spread across the unit hypersphere. Our proposal achieves competitive performance with fewer parameters, shorter training times, and lower memory consumption compared to existing GCL methods.
arXiv Detail & Related papers (2024-11-02T07:04:40Z)
ICL-TSVD: Bridging Theory and Practice in Continual Learning with Pre-trained Models [103.45785408116146]
Continual learning (CL) aims to train a model that can solve multiple tasks presented sequentially. Recent CL approaches have achieved strong performance by leveraging large pre-trained models that generalize well to downstream tasks. However, such methods lack theoretical guarantees, making them prone to unexpected failures. We bridge this gap by integrating an empirically strong approach into a principled framework, designed to prevent forgetting.
arXiv Detail & Related papers (2024-10-01T12:58:37Z)
Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP) [0.0]
CLOP is a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of linear subspaces among class embeddings. We show that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.
arXiv Detail & Related papers (2024-03-27T15:48:16Z)
What Makes Quantization for Large Language Models Hard? An Empirical Study from the Lens of Perturbation [55.153595212571375]
Quantization is a technique for improving the memory and computational efficiency of large language models (LLMs) We propose a new perspective on quantization, viewing it as perturbations added to the weights and activations of LLMs. We conduct experiments with various artificial perturbations to explore their impact on LLM performance.
arXiv Detail & Related papers (2024-03-11T03:42:51Z)
Decoupled Contrastive Learning for Long-Tailed Recognition [58.255966442426484]
Supervised Contrastive Loss (SCL) is popular in visual representation learning. In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance. We propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes.
arXiv Detail & Related papers (2024-03-10T09:46:28Z)
In-context Learning and Gradient Descent Revisited [3.085927389171139]
We show that even untrained models achieve comparable ICL-GD similarity scores despite not exhibiting ICL. Next, we explore a major discrepancy in the flow of information throughout the model between ICL and GD, which we term Layer Causality. We propose a simple GD-based optimization procedure that respects layer causality, and show it improves similarity scores significantly.
arXiv Detail & Related papers (2023-11-13T21:42:38Z)
Symmetric Neural-Collapse Representations with Supervised Contrastive Loss: The Impact of ReLU and Batching [26.994954303270575]
Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification. While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this symmetry breaks under class imbalances. This paper presents an intriguing discovery: the introduction of a ReLU activation at the final layer effectively restores the symmetry in SCL-learned representations.
arXiv Detail & Related papers (2023-06-13T17:55:39Z)
Adversarial Contrastive Learning via Asymmetric InfoNCE [64.42740292752069]
We propose to treat adversarial samples unequally when contrasted with an asymmetric InfoNCE objective. In the asymmetric fashion, the adverse impacts of conflicting objectives between CL and adversarial learning can be effectively mitigated. Experiments show that our approach consistently outperforms existing Adversarial CL methods.
arXiv Detail & Related papers (2022-07-18T04:14:36Z)
Decoupled Contrastive Learning [23.25775900388382]
We identify a noticeable negative-positive-coupling (NPC) effect in the widely used cross-entropy (InfoNCE) loss. By properly addressing the NPC effect, we reach a decoupled contrastive learning (DCL) objective function. Our approach achieves $66.9%$ ImageNet top-1 accuracy using batch size 256 within 200 epochs pre-training, outperforming its baseline SimCLR by $5.1%$.
arXiv Detail & Related papers (2021-10-13T16:38:43Z)
Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL) SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning. We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z)
Contrastive Attraction and Contrastive Repulsion for Representation Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples. Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet. We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z)
Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms. We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework. Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.