Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning
- URL: http://arxiv.org/abs/2506.04411v1
- Date: Wed, 04 Jun 2025 19:43:36 GMT
- Title: Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning
- Authors: Achleshwar Luthra, Tianbao Yang, Tomer Galanti,
- Abstract summary: We show that standard self-supervised contrastive learning objectives implicitly approximate a supervised variant we call the negatives-only supervised contrastive loss (NSCL)<n>We prove that the gap between the CL and NSCL losses vanishes as the number of semantic classes increases, under a bound that is both label-agnostic and architecture-independent.
- Score: 48.11265601808718
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite its empirical success, the theoretical foundations of self-supervised contrastive learning (CL) are not yet fully established. In this work, we address this gap by showing that standard CL objectives implicitly approximate a supervised variant we call the negatives-only supervised contrastive loss (NSCL), which excludes same-class contrasts. We prove that the gap between the CL and NSCL losses vanishes as the number of semantic classes increases, under a bound that is both label-agnostic and architecture-independent. We characterize the geometric structure of the global minimizers of the NSCL loss: the learned representations exhibit augmentation collapse, within-class collapse, and class centers that form a simplex equiangular tight frame. We further introduce a new bound on the few-shot error of linear-probing. This bound depends on two measures of feature variability--within-class dispersion and variation along the line between class centers. We show that directional variation dominates the bound and that the within-class dispersion's effect diminishes as the number of labeled samples increases. These properties enable CL and NSCL-trained representations to support accurate few-shot label recovery using simple linear probes. Finally, we empirically validate our theoretical findings: the gap between CL and NSCL losses decays at a rate of $\mathcal{O}(\frac{1}{\#\text{classes}})$; the two losses are highly correlated; minimizing the CL loss implicitly brings the NSCL loss close to the value achieved by direct minimization; and the proposed few-shot error bound provides a tight estimate of probing performance in practice.
Related papers
- A Theoretical Framework for Preventing Class Collapse in Supervised Contrastive Learning [13.790114327022449]
Supervised contrastive learning (SupCL) has emerged as a prominent approach in representation learning.<n>We present theoretically grounded guidelines for SupCL to prevent class collapse in learned representations.
arXiv Detail & Related papers (2025-03-11T09:17:58Z) - Generalized Kullback-Leibler Divergence Loss [105.66549870868971]
We prove that the Kullback-Leibler (KL) Divergence loss is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss.<n>Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement.
arXiv Detail & Related papers (2025-03-11T04:43:33Z) - Bridging Mini-Batch and Asymptotic Analysis in Contrastive Learning: From InfoNCE to Kernel-Based Losses [20.273126099815517]
We show that different contrastive learning (CL) losses actually optimize for?
We introduce a novel CL objective Decoupled Hyperspherical Energy Loss (DHEL)
We show the same results hold for another relevant CL family, namely kernel contrastive learning (KCL), with the additional advantage of the expected loss being independent of batch size.
arXiv Detail & Related papers (2024-05-28T11:00:41Z) - Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP) [0.0]
CLOP is a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of linear subspaces among class embeddings.
We show that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.
arXiv Detail & Related papers (2024-03-27T15:48:16Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - Unveiling Vulnerabilities of Contrastive Recommender Systems to Poisoning Attacks [48.911832772464145]
Contrastive learning (CL) has recently gained prominence in the domain of recommender systems.
This paper identifies a vulnerability of CL-based recommender systems that they are more susceptible to poisoning attacks aiming to promote individual items.
arXiv Detail & Related papers (2023-11-30T04:25:28Z) - Hard-Negative Sampling for Contrastive Learning: Optimal Representation Geometry and Neural- vs Dimensional-Collapse [16.42457033976047]
We prove that the losses of Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) are minimized by representations that exhibit Neural-Collapse (NC)<n>We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) losses are lower bounded by the corresponding SCL and UCL losses.
arXiv Detail & Related papers (2023-11-09T04:40:32Z) - Symmetric Neural-Collapse Representations with Supervised Contrastive
Loss: The Impact of ReLU and Batching [26.994954303270575]
Supervised contrastive loss (SCL) is a competitive and often superior alternative to the cross-entropy loss for classification.
While prior studies have demonstrated that both losses yield symmetric training representations under balanced data, this symmetry breaks under class imbalances.
This paper presents an intriguing discovery: the introduction of a ReLU activation at the final layer effectively restores the symmetry in SCL-learned representations.
arXiv Detail & Related papers (2023-06-13T17:55:39Z) - Which Features are Learnt by Contrastive Learning? On the Role of
Simplicity Bias in Class Collapse and Feature Suppression [59.97965005675144]
Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision.
We provide the first unified theoretically rigorous framework to determine textitwhich features are learnt by CL.
We present increasing embedding dimensionality and improving the quality of data augmentations as two theoretically motivated solutions.
arXiv Detail & Related papers (2023-05-25T23:37:22Z) - Supervised Contrastive Learning with Hard Negative Samples [16.42457033976047]
In contrastive learning (CL) learns a useful representation function by pulling positive samples close to each other.
In absence of class information, negative samples are chosen randomly and independently of the anchor.
Supervised CL (SCL) avoids this class collision by conditioning the negative sampling distribution to samples having labels different from that of the anchor.
arXiv Detail & Related papers (2022-08-31T19:20:04Z) - Hierarchical Semi-Supervised Contrastive Learning for
Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution.
Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies.
We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z) - Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL)
SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning.
We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z) - Unbiased Risk Estimators Can Mislead: A Case Study of Learning with
Complementary Labels [92.98756432746482]
We study a weakly supervised problem called learning with complementary labels.
We show that the quality of gradient estimation matters more in risk minimization.
We propose a novel surrogate complementary loss(SCL) framework that trades zero bias with reduced variance.
arXiv Detail & Related papers (2020-07-05T04:19:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.