Related papers: On neural and dimensional collapse in supervised and unsupervised contrastive learning with hard negative sampling

On neural and dimensional collapse in supervised and unsupervised contrastive learning with hard negative sampling

URL: http://arxiv.org/abs/2311.05139v1
Date: Thu, 9 Nov 2023 04:40:32 GMT
Title: On neural and dimensional collapse in supervised and unsupervised contrastive learning with hard negative sampling
Authors: Ruijie Jiang, Thuan Nguyen, Shuchin Aeron, Prakash Ishwar
Abstract summary: We prove that representations that exhibit Neural Collapse (NC) minimize Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) risks. We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) risks are lower bounded by the corresponding SCL and UCL risks.
Score: 17.94266316310016
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: For a widely-studied data model and general loss and sample-hardening functions we prove that the Supervised Contrastive Learning (SCL), Hard-SCL (HSCL), and Unsupervised Contrastive Learning (UCL) risks are minimized by representations that exhibit Neural Collapse (NC), i.e., the class means form an Equianglular Tight Frame (ETF) and data from the same class are mapped to the same representation. We also prove that for any representation mapping, the HSCL and Hard-UCL (HUCL) risks are lower bounded by the corresponding SCL and UCL risks. Although the optimality of ETF is known for SCL, albeit only for InfoNCE loss, its optimality for HSCL and UCL under general loss and hardening functions is novel. Moreover, our proofs are much simpler, compact, and transparent. We empirically demonstrate, for the first time, that ADAM optimization of HSCL and HUCL risks with random initialization and suitable hardness levels can indeed converge to the NC geometry if we incorporate unit-ball or unit-sphere feature normalization. Without incorporating hard negatives or feature normalization, however, the representations learned via ADAM suffer from dimensional collapse (DC) and fail to attain the NC geometry.

Related papers

Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning [48.11265601808718]
We show that standard self-supervised contrastive learning objectives implicitly approximate a supervised variant we call the negatives-only supervised contrastive loss (NSCL)<n>We prove that the gap between the CL and NSCL losses vanishes as the number of semantic classes increases, under a bound that is both label-agnostic and architecture-independent.
arXiv Detail & Related papers (2025-06-04T19:43:36Z)
Learning Identifiable Structures Helps Avoid Bias in DNN-based Supervised Causal Learning [56.22841701016295]
Supervised Causal Learning (SCL) is an emerging paradigm in this field. Existing Deep Neural Network (DNN)-based methods commonly adopt the "Node-Edge approach"
arXiv Detail & Related papers (2025-02-15T19:10:35Z)
L^2CL: Embarrassingly Simple Layer-to-Layer Contrastive Learning for Graph Collaborative Filtering [33.165094795515785]
Graph neural networks (GNNs) have recently emerged as an effective approach to model neighborhood signals in collaborative filtering. We propose L2CL, a principled Layer-to-Layer Contrastive Learning framework that contrasts representations from different layers. We find that L2CL, using only one-hop contrastive learning paradigm, is able to capture intrinsic semantic structures and improve the quality of node representation.
arXiv Detail & Related papers (2024-07-19T12:45:21Z)
Decoupled Contrastive Learning for Long-Tailed Recognition [58.255966442426484]
Supervised Contrastive Loss (SCL) is popular in visual representation learning. In the scenario of long-tailed recognition, where the number of samples in each class is imbalanced, treating two types of positive samples equally leads to the biased optimization for intra-category distance. We propose a patch-based self distillation to transfer knowledge from head to tail classes to relieve the under-representation of tail classes.
arXiv Detail & Related papers (2024-03-10T09:46:28Z)
Rethinking and Simplifying Bootstrapped Graph Latents [48.76934123429186]
Graph contrastive learning (GCL) has emerged as a representative paradigm in graph self-supervised learning. We present SGCL, a simple yet effective GCL framework that utilizes the outputs from two consecutive iterations as positive pairs. We show that SGCL can achieve competitive performance with fewer parameters, lower time and space costs, and significant convergence speedup.
arXiv Detail & Related papers (2023-12-05T09:49:50Z)
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization [111.55277952086155]
We study In-Context Learning (ICL) by addressing several open questions. We show that, without updating the neural network parameters, ICL implicitly implements the Bayesian model averaging algorithm. We prove that the error of pretrained model is bounded by a sum of an approximation error and a generalization error.
arXiv Detail & Related papers (2023-05-30T21:23:47Z)
Unifying Graph Contrastive Learning with Flexible Contextual Scopes [57.86762576319638]
We present a self-supervised learning method termed Unifying Graph Contrastive Learning with Flexible Contextual Scopes (UGCL for short) Our algorithm builds flexible contextual representations with contextual scopes by controlling the power of an adjacency matrix. Based on representations from both local and contextual scopes, distL optimises a very simple contrastive loss function for graph representation learning.
arXiv Detail & Related papers (2022-10-17T07:16:17Z)
Supervised Contrastive Learning with Hard Negative Samples [16.42457033976047]
In contrastive learning (CL) learns a useful representation function by pulling positive samples close to each other. In absence of class information, negative samples are chosen randomly and independently of the anchor. Supervised CL (SCL) avoids this class collision by conditioning the negative sampling distribution to samples having labels different from that of the anchor.
arXiv Detail & Related papers (2022-08-31T19:20:04Z)
Hierarchical Semi-Supervised Contrastive Learning for Contamination-Resistant Anomaly Detection [81.07346419422605]
Anomaly detection aims at identifying deviant samples from the normal data distribution. Contrastive learning has provided a successful way to sample representation that enables effective discrimination on anomalies. We propose a novel hierarchical semi-supervised contrastive learning framework, for contamination-resistant anomaly detection.
arXiv Detail & Related papers (2022-07-24T18:49:26Z)
An Asymmetric Contrastive Loss for Handling Imbalanced Datasets [0.0]
We introduce an asymmetric version of CL, referred to as ACL, to address the problem of class imbalance. In addition, we propose the asymmetric focal contrastive loss (AFCL) as a further generalization of both ACL and focal contrastive loss. Results on the FMNIST and ISIC 2018 imbalanced datasets show that AFCL is capable of outperforming CL and FCL in terms of both weighted and unweighted classification accuracies.
arXiv Detail & Related papers (2022-07-14T17:30:13Z)
Debiased Graph Contrastive Learning [27.560217866753938]
We propose a novel and effective method to estimate the probability whether each negative sample is true or not. Debiased Graph Contrastive Learning (DGCL) outperforms or matches previous unsupervised state-of-the-art results on several benchmarks.
arXiv Detail & Related papers (2021-10-05T13:15:59Z)
Semi-supervised Contrastive Learning with Similarity Co-calibration [72.38187308270135]
We propose a novel training strategy, termed as Semi-supervised Contrastive Learning (SsCL) SsCL combines the well-known contrastive loss in self-supervised learning with the cross entropy loss in semi-supervised learning. We show that SsCL produces more discriminative representation and is beneficial to few shot learning.
arXiv Detail & Related papers (2021-05-16T09:13:56Z)
Sample-efficient L0-L2 constrained structure learning of sparse Ising models [3.056751497358646]
We consider the problem of learning the underlying graph of a sparse Ising model with $p$ nodes from $n$ i.i.d. samples. We leverage the cardinality constraint L0 norm, which is known to properly induce sparsity, and further combine it with an L2 norm to better model the non-zero coefficients.
arXiv Detail & Related papers (2020-12-03T07:52:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.