Related papers: SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

SINCERE: Supervised Information Noise-Contrastive Estimation REvisited

URL: http://arxiv.org/abs/2309.14277v3
Date: Tue, 2 Jul 2024 16:02:39 GMT
Title: SINCERE: Supervised Information Noise-Contrastive Estimation REvisited
Authors: Patrick Feeney, Michael C. Hughes,
Abstract summary: Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels. We propose the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE. Experiments show that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy.
Score: 5.004880836963827
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The information noise-contrastive estimation (InfoNCE) loss function provides the basis of many self-supervised deep learning methods due to its strong empirical results and theoretic motivation. Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels. This SupCon loss has been widely-used due to reports of good empirical performance. However, in this work we find that the prior SupCon loss formulation has questionable justification because it can encourage some images from the same class to repel one another in the learned embedding space. This problematic intra-class repulsion gets worse as the number of images sharing one class label increases. We propose the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE that eliminates intra-class repulsion. Experiments show that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy. We additionally utilize probabilistic modeling to derive an information-theoretic bound that relates SINCERE loss to the symmeterized KL divergence between data-generating distributions for a target class and all other classes.

Related papers

How does Labeling Error Impact Contrastive Learning? A Perspective from Data Dimensionality Reduction [29.43826752911795]
This paper investigates the theoretical impact of labeling error on the downstream classification performance of contrastive learning.<n>To mitigate these impacts, data dimensionality reduction method (e.g., singular value decomposition) is applied on original data to reduce false positive samples.<n>It is also found that SVD acts as a double-edged sword, which may lead to the deterioration of downstream classification accuracy due to the reduced connectivity of the augmentation graph.
arXiv Detail & Related papers (2025-07-15T10:09:55Z)
Studying Classifier(-Free) Guidance From a Classifier-Centric Perspective [100.54185280153753]
We find that both classifier guidance and classifier-free guidance achieve conditional generation by pushing the denoising diffusion trajectories away from decision boundaries. We propose a generic postprocessing step built upon flow-matching to shrink the gap between the learned distribution for a pretrained denoising diffusion model and the real data distribution.
arXiv Detail & Related papers (2025-03-13T17:59:59Z)
CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning [52.63674911541416]
Few-shot class-incremental learning (FSCIL) faces several challenges, such as overfitting and forgetting. Our primary focus is representation learning on base classes to tackle the unique challenge of FSCIL. We find that trying to secure the spread of features within a more confined feature space enables the learned representation to strike a better balance between transferability and discriminability.
arXiv Detail & Related papers (2024-10-08T02:23:16Z)
Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet. We present a framework for understanding how DA interacts with class-level learning dynamics. We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z)
Few-shot Object Detection with Refined Contrastive Learning [4.520231308678286]
We propose a novel few-shot object detection (FSOD) method with Refined Contrastive Learning (FSRC) A pre-determination component is introduced to find out the Resemblance Group from novel classes which contains confusable classes. RCL is pointedly performed on this group of classes in order to increase the inter-class distances among them.
arXiv Detail & Related papers (2022-11-24T09:34:20Z)
Incorporating Semi-Supervised and Positive-Unlabeled Learning for Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference. Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance. In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z)
Generalizable Information Theoretic Causal Representation [37.54158138447033]
We propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph. The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability.
arXiv Detail & Related papers (2022-02-17T00:38:35Z)
Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning via Ranked Positives [44.962289510218646]
RINCE can exploit information about a similarity ranking for learning a corresponding embedding space. We show that RINCE learns favorable embeddings compared to the standard InfoNCE whenever at least noisy ranking information can be obtained.
arXiv Detail & Related papers (2022-01-27T18:55:32Z)
Studying the Interplay between Information Loss and Operation Loss in Representations for Classification [15.369895042965261]
Information-theoretic measures have been widely adopted in the design of features for learning and decision problems. We show that it is possible to adopt an alternative notion of informational sufficiency to achieve operational sufficiency in learning.
arXiv Detail & Related papers (2021-12-30T23:17:05Z)
Categorical Relation-Preserving Contrastive Knowledge Distillation for Medical Image Classification [75.27973258196934]
We propose a novel Categorical Relation-preserving Contrastive Knowledge Distillation (CRCKD) algorithm, which takes the commonly used mean-teacher model as the supervisor. With this regularization, the feature distribution of the student model shows higher intra-class similarity and inter-class variance. With the contribution of the CCD and CRP, our CRCKD algorithm can distill the relational knowledge more comprehensively.
arXiv Detail & Related papers (2021-07-07T13:56:38Z)
Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive Learners With FlatNCE [104.37515476361405]
We reveal mathematically why contrastive learners fail in the small-batch-size regime. We present a novel non-native contrastive objective named FlatNCE, which fixes this issue.
arXiv Detail & Related papers (2021-07-02T15:50:43Z)
Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect [95.37587481952487]
Long-tailed classification is the key to deep learning at scale. Existing methods are mainly based on re-weighting/resamplings that lack a fundamental theory. In this paper, we establish a causal inference framework, which not only unravels the whys of previous methods, but also derives a new principled solution.
arXiv Detail & Related papers (2020-09-28T00:32:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.