A Theoretical Framework for Preventing Class Collapse in Supervised Contrastive Learning
- URL: http://arxiv.org/abs/2503.08203v1
- Date: Tue, 11 Mar 2025 09:17:58 GMT
- Title: A Theoretical Framework for Preventing Class Collapse in Supervised Contrastive Learning
- Authors: Chungpa Lee, Jeongheon Oh, Kibok Lee, Jy-yong Sohn,
- Abstract summary: Supervised contrastive learning (SupCL) has emerged as a prominent approach in representation learning.<n>We present theoretically grounded guidelines for SupCL to prevent class collapse in learned representations.
- Score: 13.790114327022449
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Supervised contrastive learning (SupCL) has emerged as a prominent approach in representation learning, leveraging both supervised and self-supervised losses. However, achieving an optimal balance between these losses is challenging; failing to do so can lead to class collapse, reducing discrimination among individual embeddings in the same class. In this paper, we present theoretically grounded guidelines for SupCL to prevent class collapse in learned representations. Specifically, we introduce the Simplex-to-Simplex Embedding Model (SSEM), a theoretical framework that models various embedding structures, including all embeddings that minimize the supervised contrastive loss. Through SSEM, we analyze how hyperparameters affect learned representations, offering practical guidelines for hyperparameter selection to mitigate the risk of class collapse. Our theoretical findings are supported by empirical results across synthetic and real-world datasets.
Related papers
- On the Discrimination and Consistency for Exemplar-Free Class Incremental Learning [19.898602404329697]
Exemplar-free class incremental learning (EF-CIL) is a nontrivial task that requires continuously enriching model capability with new classes while maintaining previously learned knowledge without storing and replaying any old class exemplars.<n>An emerging theory-guided framework for CIL trains task-specific models for a shared network, shifting the pressure of forgetting to task-id prediction.<n>In EF-CIL, task-id prediction is more challenging due to the lack of inter-task interaction (e.g., replays of exemplars)
arXiv Detail & Related papers (2025-01-26T08:50:33Z) - Anti-Collapse Loss for Deep Metric Learning Based on Coding Rate Metric [99.19559537966538]
DML aims to learn a discriminative high-dimensional embedding space for downstream tasks like classification, clustering, and retrieval.
To maintain the structure of embedding space and avoid feature collapse, we propose a novel loss function called Anti-Collapse Loss.
Comprehensive experiments on benchmark datasets demonstrate that our proposed method outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2024-07-03T13:44:20Z) - Large Margin Discriminative Loss for Classification [3.3975558777609915]
We introduce a novel discriminative loss function with large margin in the context of Deep Learning.<n>This loss boosts the discriminative power of neural networks represented by intra-class compactness and inter-class separability.<n>We design a strategy called partial momentum updating that enjoys simultaneously stability and consistency in training.
arXiv Detail & Related papers (2024-05-28T18:10:45Z) - Preventing Collapse in Contrastive Learning with Orthonormal Prototypes (CLOP) [0.0]
CLOP is a novel semi-supervised loss function designed to prevent neural collapse by promoting the formation of linear subspaces among class embeddings.
We show that CLOP enhances performance, providing greater stability across different learning rates and batch sizes.
arXiv Detail & Related papers (2024-03-27T15:48:16Z) - Relaxed Contrastive Learning for Federated Learning [48.96253206661268]
We propose a novel contrastive learning framework to address the challenges of data heterogeneity in federated learning.
Our framework outperforms all existing federated learning approaches by huge margins on the standard benchmarks.
arXiv Detail & Related papers (2024-01-10T04:55:24Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Uncertainty-guided Boundary Learning for Imbalanced Social Event
Detection [64.4350027428928]
We propose a novel uncertainty-guided class imbalance learning framework for imbalanced social event detection tasks.
Our model significantly improves social event representation and classification tasks in almost all classes, especially those uncertain ones.
arXiv Detail & Related papers (2023-10-30T03:32:04Z) - Hierarchical Decomposition of Prompt-Based Continual Learning:
Rethinking Obscured Sub-optimality [55.88910947643436]
Self-supervised pre-training is essential for handling vast quantities of unlabeled data in practice.
HiDe-Prompt is an innovative approach that explicitly optimize the hierarchical components with an ensemble of task-specific prompts and statistics.
Our experiments demonstrate the superior performance of HiDe-Prompt and its robustness to pre-training paradigms in continual learning.
arXiv Detail & Related papers (2023-10-11T06:51:46Z) - A Unified Generalization Analysis of Re-Weighting and Logit-Adjustment
for Imbalanced Learning [129.63326990812234]
We propose a technique named data-dependent contraction to capture how modified losses handle different classes.
On top of this technique, a fine-grained generalization bound is established for imbalanced learning, which helps reveal the mystery of re-weighting and logit-adjustment.
arXiv Detail & Related papers (2023-10-07T09:15:08Z) - An Investigation of Representation and Allocation Harms in Contrastive
Learning [55.42336321517228]
We demonstrate that contrastive learning (CL) tends to collapse representations of minority groups with certain majority groups.
We refer to this phenomenon as representation harm and demonstrate it on image and text datasets using the corresponding popular CL methods.
We provide a theoretical explanation for representation harm using a neural block model that leads to a representational collapse in a contrastive learning setting.
arXiv Detail & Related papers (2023-10-02T19:25:37Z) - Towards Understanding the Mechanism of Contrastive Learning via
Similarity Structure: A Theoretical Analysis [10.29814984060018]
We consider a kernel-based contrastive learning framework termed Kernel Contrastive Learning (KCL)
We introduce a formulation of the similarity structure of learned representations by utilizing a statistical dependency viewpoint.
We show a new upper bound of the classification error of a downstream task, which explains that our theory is consistent with the empirical success of contrastive learning.
arXiv Detail & Related papers (2023-04-01T21:53:29Z) - Robust Unsupervised Learning via L-Statistic Minimization [38.49191945141759]
We present a general approach to this problem focusing on unsupervised learning.
The key assumption is that the perturbing distribution is characterized by larger losses relative to a given class of admissible models.
We prove uniform convergence bounds with respect to the proposed criterion for several popular models in unsupervised learning.
arXiv Detail & Related papers (2020-12-14T10:36:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.