Contrastive Learning with Orthonormal Anchors (CLOA)
- URL: http://arxiv.org/abs/2403.18699v1
- Date: Wed, 27 Mar 2024 15:48:16 GMT
- Title: Contrastive Learning with Orthonormal Anchors (CLOA)
- Authors: Huanran Li, Daniel Pimentel-Alarcón,
- Abstract summary: This study focuses on addressing the instability issues prevalent in contrastive learning, specifically examining the InfoNCE loss function and its derivatives.
We reveal a critical observation that these loss functions exhibit a restrictive behavior, leading to a convergence phenomenon where embeddings tend to merge into a singular point.
This "over-fusion" effect detrimentally affects classification accuracy in subsequent supervised-learning tasks.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This study focuses on addressing the instability issues prevalent in contrastive learning, specifically examining the InfoNCE loss function and its derivatives. We reveal a critical observation that these loss functions exhibit a restrictive behavior, leading to a convergence phenomenon where embeddings tend to merge into a singular point. This "over-fusion" effect detrimentally affects classification accuracy in subsequent supervised-learning tasks. Through theoretical analysis, we demonstrate that embeddings, when equalized or confined to a rank-1 linear subspace, represent a local minimum for InfoNCE. In response to this challenge, our research introduces an innovative strategy that leverages the same or fewer labeled data than typically used in the fine-tuning phase. The loss we proposed, Orthonormal Anchor Regression Loss, is designed to disentangle embedding clusters, significantly enhancing the distinctiveness of each embedding while simultaneously ensuring their aggregation into dense, well-defined clusters. Our method demonstrates remarkable improvements with just a fraction of the conventional label requirements, as evidenced by our results on CIFAR10 and CIFAR100 datasets.
Related papers
- Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning [48.11265601808718]
We show that standard self-supervised contrastive learning objectives implicitly approximate a supervised variant we call the negatives-only supervised contrastive loss (NSCL)<n>We prove that the gap between the CL and NSCL losses vanishes as the number of semantic classes increases, under a bound that is both label-agnostic and architecture-independent.
arXiv Detail & Related papers (2025-06-04T19:43:36Z) - On the Surprising Effectiveness of Large Learning Rates under Standard Width Scaling [11.168336416219857]
Existing infinite-width theory would predict instability under large learning rates and vanishing feature learning under stable learning rates.<n>We show that this discrepancy is not fully explained by finite-width phenomena such as catapult effects.<n>We validate that neural networks operate in this controlled divergence regime under CE loss but not under MSE loss.
arXiv Detail & Related papers (2025-05-28T15:40:48Z) - A Theoretical Framework for Preventing Class Collapse in Supervised Contrastive Learning [13.790114327022449]
Supervised contrastive learning (SupCL) has emerged as a prominent approach in representation learning.
We present theoretically grounded guidelines for SupCL to prevent class collapse in learned representations.
arXiv Detail & Related papers (2025-03-11T09:17:58Z) - SimO Loss: Anchor-Free Contrastive Loss for Fine-Grained Supervised Contrastive Learning [0.0]
We introduce a novel anchor-free contrastive learning (L) method leveraging our proposed Similarity-Orthogonality (SimO) loss.
Our approach minimizes a semi-metric discriminative loss function that simultaneously optimize two key objectives.
We provide visualizations that demonstrate the impact of SimO loss on the embedding space.
arXiv Detail & Related papers (2024-10-07T17:41:10Z) - Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval.
To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss.
Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z) - Bayesian Learning-driven Prototypical Contrastive Loss for Class-Incremental Learning [42.14439854721613]
We propose a prototypical network with a Bayesian learning-driven contrastive loss (BLCL) tailored specifically for class-incremental learning scenarios.
Our approach dynamically adapts the balance between the cross-entropy and contrastive loss functions with a Bayesian learning technique.
arXiv Detail & Related papers (2024-05-17T19:49:02Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Rethinking Prototypical Contrastive Learning through Alignment,
Uniformity and Correlation [24.794022951873156]
We propose to learn Prototypical representation through Alignment, Uniformity and Correlation (PAUC)
Specifically, the ordinary ProtoNCE loss is revised with: (1) an alignment loss that pulls embeddings from positive prototypes together; (2) a loss that distributes the prototypical level features uniformly; (3) a correlation loss that increases the diversity and discriminability between prototypical level features.
arXiv Detail & Related papers (2022-10-18T22:33:12Z) - A Low Rank Promoting Prior for Unsupervised Contrastive Learning [108.91406719395417]
We construct a novel probabilistic graphical model that effectively incorporates the low rank promoting prior into the framework of contrastive learning.
Our hypothesis explicitly requires that all the samples belonging to the same instance class lie on the same subspace with small dimension.
Empirical evidences show that the proposed algorithm clearly surpasses the state-of-the-art approaches on multiple benchmarks.
arXiv Detail & Related papers (2021-08-05T15:58:25Z) - Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL)
SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning.
We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z) - Robust Unsupervised Learning via L-Statistic Minimization [38.49191945141759]
We present a general approach to this problem focusing on unsupervised learning.
The key assumption is that the perturbing distribution is characterized by larger losses relative to a given class of admissible models.
We prove uniform convergence bounds with respect to the proposed criterion for several popular models in unsupervised learning.
arXiv Detail & Related papers (2020-12-14T10:36:06Z) - Adversarial Distributional Training for Robust Deep Learning [53.300984501078126]
Adversarial training (AT) is among the most effective techniques to improve model robustness by augmenting training data with adversarial examples.
Most existing AT methods adopt a specific attack to craft adversarial examples, leading to the unreliable robustness against other unseen attacks.
In this paper, we introduce adversarial distributional training (ADT), a novel framework for learning robust models.
arXiv Detail & Related papers (2020-02-14T12:36:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.