Related papers: Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression

Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression

URL: http://arxiv.org/abs/2305.16536v2
Date: Mon, 29 May 2023 00:40:01 GMT
Title: Which Features are Learnt by Contrastive Learning? On the Role of Simplicity Bias in Class Collapse and Feature Suppression
Authors: Yihao Xue, Siddharth Joshi, Eric Gan, Pin-Yu Chen, Baharan Mirzasoleiman
Abstract summary: Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision. We provide the first unified theoretically rigorous framework to determine textitwhich features are learnt by CL. We present increasing embedding dimensionality and improving the quality of data augmentations as two theoretically motivated solutions.
Score: 59.97965005675144
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contrastive learning (CL) has emerged as a powerful technique for representation learning, with or without label supervision. However, supervised CL is prone to collapsing representations of subclasses within a class by not capturing all their features, and unsupervised CL may suppress harder class-relevant features by focusing on learning easy class-irrelevant features; both significantly compromise representation quality. Yet, there is no theoretical understanding of \textit{class collapse} or \textit{feature suppression} at \textit{test} time. We provide the first unified theoretically rigorous framework to determine \textit{which} features are learnt by CL. Our analysis indicate that, perhaps surprisingly, bias of (stochastic) gradient descent towards finding simpler solutions is a key factor in collapsing subclass representations and suppressing harder class-relevant features. Moreover, we present increasing embedding dimensionality and improving the quality of data augmentations as two theoretically motivated solutions to {feature suppression}. We also provide the first theoretical explanation for why employing supervised and unsupervised CL together yields higher-quality representations, even when using commonly-used stochastic gradient methods.

Related papers

Self-Supervised Contrastive Learning is Approximately Supervised Contrastive Learning [48.11265601808718]
We show that standard self-supervised contrastive learning objectives implicitly approximate a supervised variant we call the negatives-only supervised contrastive loss (NSCL)<n>We prove that the gap between the CL and NSCL losses vanishes as the number of semantic classes increases, under a bound that is both label-agnostic and architecture-independent.
arXiv Detail & Related papers (2025-06-04T19:43:36Z)
Reducing Class-wise Confusion for Incremental Learning with Disentangled Manifolds [27.39149861687709]
Class incremental learning (CIL) aims to enable models to continuously learn new classes without forgetting old ones. We develop a Confusion-REduced AuTo-Encoder classifier (CREATE) for CIL. Our method employs a lightweight auto-encoder module to learn compact manifold for each class in the latent subspace.
arXiv Detail & Related papers (2025-03-22T07:07:15Z)
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation [19.749490092520006]
Self-Calibrated CLIP (SC-CLIP) is a training-free method that calibrates CLIP to produce finer-language representations. SC-CLIP boosts the performance of vanilla CLIP ViT-L/14 by 6.8 times.
arXiv Detail & Related papers (2024-11-24T15:14:05Z)
Fine-Grained Representation Learning via Multi-Level Contrastive Learning without Class Priors [3.050634053489509]
Contrastive Disentangling (CD) is a framework designed to learn representations without relying on class priors. CD integrates instance-level and feature-level contrastive losses with a normalized entropy loss to capture semantically rich and fine-grained representations.
arXiv Detail & Related papers (2024-09-07T16:39:14Z)
REAL: Representation Enhanced Analytic Learning for Exemplar-free Class-incremental Learning [21.98964541770695]
Exemplar-free class-incremental learning (EFCIL) aims to mitigate catastrophic forgetting in class-incremental learning (CIL) without available historical training samples as exemplars.<n>Recently, a new EFCIL branch named Analytic Continual Learning (ACL) introduces a gradient-free paradigm.<n>We propose a representation-enhanced analytic learning (REAL) to address these problems.
arXiv Detail & Related papers (2024-03-20T11:48:10Z)
Subclass-balancing Contrastive Learning for Long-tailed Recognition [38.31221755013738]
Long-tailed recognition with imbalanced class distribution naturally emerges in practical machine learning applications. We propose a novel subclass-balancing contrastive learning'' approach that clusters each head class into multiple subclasses of similar sizes as the tail classes. We evaluate SBCL over a list of long-tailed benchmark datasets and it achieves the state-of-the-art performance.
arXiv Detail & Related papers (2023-06-28T05:08:43Z)
Triplet Contrastive Learning for Unsupervised Vehicle Re-identification [55.445358749042384]
Part feature learning is a critical technology for fine semantic understanding in vehicle re-identification. We propose a novel Triplet Contrastive Learning framework (TCL) which leverages cluster features to bridge the part features and global features.
arXiv Detail & Related papers (2023-01-23T15:52:12Z)
Self-Supervised Class Incremental Learning [51.62542103481908]
Existing Class Incremental Learning (CIL) methods are based on a supervised classification framework sensitive to data labels. When updating them based on the new class data, they suffer from catastrophic forgetting: the model cannot discern old class data clearly from the new. In this paper, we explore the performance of Self-Supervised representation learning in Class Incremental Learning (SSCIL) for the first time.
arXiv Detail & Related papers (2021-11-18T06:58:19Z)
Learning Debiased and Disentangled Representations for Semantic Segmentation [52.35766945827972]
We propose a model-agnostic and training scheme for semantic segmentation. By randomly eliminating certain class information in each training iteration, we effectively reduce feature dependencies among classes. Models trained with our approach demonstrate strong results on multiple semantic segmentation benchmarks.
arXiv Detail & Related papers (2021-10-31T16:15:09Z)
Prototypical Classifier for Robust Class-Imbalanced Learning [64.96088324684683]
We propose textitPrototypical, which does not require fitting additional parameters given the embedding network. Prototypical produces balanced and comparable predictions for all classes even though the training set is class-imbalanced. We test our method on CIFAR-10LT, CIFAR-100LT and Webvision datasets, observing that Prototypical obtains substaintial improvements compared with state of the arts.
arXiv Detail & Related papers (2021-10-22T01:55:01Z)
Exploiting a Joint Embedding Space for Generalized Zero-Shot Semantic Segmentation [25.070027668717422]
Generalized zero-shot semantic segmentation (GZS3) predicts pixel-wise semantic labels for seen and unseen classes. Most GZS3 methods adopt a generative approach that synthesizes visual features of unseen classes from corresponding semantic ones. We propose a discriminative approach to address limitations in a unified framework.
arXiv Detail & Related papers (2021-08-14T13:33:58Z)
Binary Classification from Multiple Unlabeled Datasets via Surrogate Set Classification [94.55805516167369]
We propose a new approach for binary classification from m U-sets for $mge2$. Our key idea is to consider an auxiliary classification task called surrogate set classification (SSC)
arXiv Detail & Related papers (2021-02-01T07:36:38Z)
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition [52.66360172784038]
We propose a clustering-based model, which considers all training samples at once, instead of optimizing for each instance individually. We call the proposed method CLASTER and observe that it consistently improves over the state-of-the-art in all standard datasets.
arXiv Detail & Related papers (2021-01-18T12:46:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.