On the Alignment Between Supervised and Self-Supervised Contrastive Learning
- URL: http://arxiv.org/abs/2510.08852v1
- Date: Thu, 09 Oct 2025 23:01:42 GMT
- Title: On the Alignment Between Supervised and Self-Supervised Contrastive Learning
- Authors: Achleshwar Luthra, Priyadarsi Mishra, Tomer Galanti,
- Abstract summary: Self-supervised contrastive learning (CL) has achieved remarkable empirical success.<n>Recent theory shows that the CL loss closely approximates a supervised surrogate, Negatives-Only Supervised Contrastive Learning (NSCL) loss.<n>We analyze the representation alignment of CL and NSCL models trained under shared randomness.
- Score: 8.71512981775656
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Self-supervised contrastive learning (CL) has achieved remarkable empirical success, often producing representations that rival supervised pre-training on downstream tasks. Recent theory explains this by showing that the CL loss closely approximates a supervised surrogate, Negatives-Only Supervised Contrastive Learning (NSCL) loss, as the number of classes grows. Yet this loss-level similarity leaves an open question: {\em Do CL and NSCL also remain aligned at the representation level throughout training, not just in their objectives?} We address this by analyzing the representation alignment of CL and NSCL models trained under shared randomness (same initialization, batches, and augmentations). First, we show that their induced representations remain similar: specifically, we prove that the similarity matrices of CL and NSCL stay close under realistic conditions. Our bounds provide high-probability guarantees on alignment metrics such as centered kernel alignment (CKA) and representational similarity analysis (RSA), and they clarify how alignment improves with more classes, higher temperatures, and its dependence on batch size. In contrast, we demonstrate that parameter-space coupling is inherently unstable: divergence between CL and NSCL weights can grow exponentially with training time. Finally, we validate these predictions empirically, showing that CL-NSCL alignment strengthens with scale and temperature, and that NSCL tracks CL more closely than other supervised objectives. This positions NSCL as a principled bridge between self-supervised and supervised learning. Our code and project page are available at [\href{https://github.com/DLFundamentals/understanding_ssl_v2}{code}, \href{https://dlfundamentals.github.io/cl-nscl-representation-alignment/}{project page}].
Related papers
- ACL: Aligned Contrastive Learning Improves BERT and Multi-exit BERT Fine-tuning [3.060720241524644]
We introduce a novel underlineAligned underlineContrastive underlineLearning (ACL) framework.<n>ACL-Embed regards label embeddings as extra augmented samples with different labels and employs contrastive learning to align the label embeddings with its samples' representations.<n>To facilitate the optimization of ACL-Embed objective combined with the CE loss, we propose ACL-Grad, which will discard the ACL-Embed term if the two objectives are in conflict.
arXiv Detail & Related papers (2026-02-03T14:08:07Z) - CLA: Latent Alignment for Online Continual Self-Supervised Learning [53.52783900926569]
We introduce Continual Latent Alignment (CLA), a novel SSL strategy for Online CL.<n>Our CLA is able to speed up the convergence of the training process in the online scenario, outperforming state-of-the-art approaches under the same computational budget.<n>We also discovered that using CLA as a pretraining protocol in the early stages of pretraining leads to a better final performance when compared to a full i.i.d. pretraining.
arXiv Detail & Related papers (2025-07-14T16:23:39Z) - Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning [48.11265601808718]
We show that standard self-supervised contrastive learning objectives implicitly approximate a supervised variant we call the negatives-only supervised contrastive loss (NSCL)<n>We prove that the gap between the CL and NSCL losses vanishes as the number of semantic classes increases, under a bound that is both label-agnostic and architecture-independent.
arXiv Detail & Related papers (2025-06-04T19:43:36Z) - Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [19.749490092520006]
Self-Calibrated CLIP (SC-CLIP) is a training-free method that calibrates CLIP to produce finer representations.<n>SC-CLIP boosts the performance of vanilla CLIP ViT-L/14 by 6.8 times.
arXiv Detail & Related papers (2024-11-24T15:14:05Z) - Fusion Self-supervised Learning for Recommendation [16.02820746003461]
We propose a Fusion Self-supervised Learning framework for recommendation.<n>Specifically, we use high-order information from GCN process to create contrastive views.<n>To integrate self-supervised signals from various CL objectives, we propose an advanced CL objective.
arXiv Detail & Related papers (2024-07-29T04:30:38Z) - CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models [23.398619576886375]
Continual learning (CL) aims to help deep neural networks learn new knowledge while retaining what has been learned.
Our work proposes Continual LeArning with Probabilistic finetuning (CLAP) - a probabilistic modeling framework over visual-guided text features per task.
arXiv Detail & Related papers (2024-03-28T04:15:58Z) - Decoupled Contrastive Learning for Long-Tailed Recognition [58.255966442426484]
Supervised Contrastive Loss (SCL) is popular in visual representation learning.
In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance.
We propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes.
arXiv Detail & Related papers (2024-03-10T09:46:28Z) - RecDCL: Dual Contrastive Learning for Recommendation [65.6236784430981]
We propose a dual contrastive learning recommendation framework -- RecDCL.
In RecDCL, the FCL objective is designed to eliminate redundant solutions on user-item positive pairs.
The BCL objective is utilized to generate contrastive embeddings on output vectors for enhancing the robustness of the representations.
arXiv Detail & Related papers (2024-01-28T11:51:09Z) - HomoGCL: Rethinking Homophily in Graph Contrastive Learning [64.85392028383164]
HomoGCL is a model-agnostic framework to expand the positive set using neighbor nodes with neighbor-specific significances.
We show that HomoGCL yields multiple state-of-the-art results across six public datasets.
arXiv Detail & Related papers (2023-06-16T04:06:52Z) - Beyond Supervised Continual Learning: a Review [69.9674326582747]
Continual Learning (CL) is a flavor of machine learning where the usual assumption of stationary data distribution is relaxed or omitted.
Changes in the data distribution can cause the so-called catastrophic forgetting (CF) effect: an abrupt loss of previous knowledge.
This article reviews literature that study CL in other settings, such as learning with reduced supervision, fully unsupervised learning, and reinforcement learning.
arXiv Detail & Related papers (2022-08-30T14:44:41Z) - An Asymmetric Contrastive Loss for Handling Imbalanced Datasets [0.0]
We introduce an asymmetric version of CL, referred to as ACL, to address the problem of class imbalance.
In addition, we propose the asymmetric focal contrastive loss (AFCL) as a further generalization of both ACL and focal contrastive loss.
Results on the FMNIST and ISIC 2018 imbalanced datasets show that AFCL is capable of outperforming CL and FCL in terms of both weighted and unweighted classification accuracies.
arXiv Detail & Related papers (2022-07-14T17:30:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.