SINCERE: Supervised Information Noise-Contrastive Estimation REvisited
- URL: http://arxiv.org/abs/2309.14277v3
- Date: Tue, 2 Jul 2024 16:02:39 GMT
- Title: SINCERE: Supervised Information Noise-Contrastive Estimation REvisited
- Authors: Patrick Feeney, Michael C. Hughes,
- Abstract summary: Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels.
We propose the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE.
Experiments show that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy.
- Score: 5.004880836963827
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The information noise-contrastive estimation (InfoNCE) loss function provides the basis of many self-supervised deep learning methods due to its strong empirical results and theoretic motivation. Previous work suggests a supervised contrastive (SupCon) loss to extend InfoNCE to learn from available class labels. This SupCon loss has been widely-used due to reports of good empirical performance. However, in this work we find that the prior SupCon loss formulation has questionable justification because it can encourage some images from the same class to repel one another in the learned embedding space. This problematic intra-class repulsion gets worse as the number of images sharing one class label increases. We propose the Supervised InfoNCE REvisited (SINCERE) loss as a theoretically-justified supervised extension of InfoNCE that eliminates intra-class repulsion. Experiments show that SINCERE leads to better separation of embeddings from different classes and improves transfer learning classification accuracy. We additionally utilize probabilistic modeling to derive an information-theoretic bound that relates SINCERE loss to the symmeterized KL divergence between data-generating distributions for a target class and all other classes.
Related papers
- CLOSER: Towards Better Representation Learning for Few-Shot Class-Incremental Learning [52.63674911541416]
Few-shot class-incremental learning (FSCIL) faces several challenges, such as overfitting and forgetting.
Our primary focus is representation learning on base classes to tackle the unique challenge of FSCIL.
We find that trying to secure the spread of features within a more confined feature space enables the learned representation to strike a better balance between transferability and discriminability.
arXiv Detail & Related papers (2024-10-08T02:23:16Z) - Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet.
We present a framework for understanding how DA interacts with class-level learning dynamics.
We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z) - Few-shot Object Detection with Refined Contrastive Learning [4.520231308678286]
We propose a novel few-shot object detection (FSOD) method with Refined Contrastive Learning (FSRC)
A pre-determination component is introduced to find out the Resemblance Group from novel classes which contains confusable classes.
RCL is pointedly performed on this group of classes in order to increase the inter-class distances among them.
arXiv Detail & Related papers (2022-11-24T09:34:20Z) - Incorporating Semi-Supervised and Positive-Unlabeled Learning for
Boosting Full Reference Image Quality Assessment [73.61888777504377]
Full-reference (FR) image quality assessment (IQA) evaluates the visual quality of a distorted image by measuring its perceptual difference with pristine-quality reference.
Unlabeled data can be easily collected from an image degradation or restoration process, making it encouraging to exploit unlabeled training data to boost FR-IQA performance.
In this paper, we suggest to incorporate semi-supervised and positive-unlabeled (PU) learning for exploiting unlabeled data while mitigating the adverse effect of outliers.
arXiv Detail & Related papers (2022-04-19T09:10:06Z) - Generalizable Information Theoretic Causal Representation [37.54158138447033]
We propose to learn causal representation from observational data by regularizing the learning procedure with mutual information measures according to our hypothetical causal graph.
The optimization involves a counterfactual loss, based on which we deduce a theoretical guarantee that the causality-inspired learning is with reduced sample complexity and better generalization ability.
arXiv Detail & Related papers (2022-02-17T00:38:35Z) - Ranking Info Noise Contrastive Estimation: Boosting Contrastive Learning
via Ranked Positives [44.962289510218646]
RINCE can exploit information about a similarity ranking for learning a corresponding embedding space.
We show that RINCE learns favorable embeddings compared to the standard InfoNCE whenever at least noisy ranking information can be obtained.
arXiv Detail & Related papers (2022-01-27T18:55:32Z) - Studying the Interplay between Information Loss and Operation Loss in
Representations for Classification [15.369895042965261]
Information-theoretic measures have been widely adopted in the design of features for learning and decision problems.
We show that it is possible to adopt an alternative notion of informational sufficiency to achieve operational sufficiency in learning.
arXiv Detail & Related papers (2021-12-30T23:17:05Z) - Categorical Relation-Preserving Contrastive Knowledge Distillation for
Medical Image Classification [75.27973258196934]
We propose a novel Categorical Relation-preserving Contrastive Knowledge Distillation (CRCKD) algorithm, which takes the commonly used mean-teacher model as the supervisor.
With this regularization, the feature distribution of the student model shows higher intra-class similarity and inter-class variance.
With the contribution of the CCD and CRP, our CRCKD algorithm can distill the relational knowledge more comprehensively.
arXiv Detail & Related papers (2021-07-07T13:56:38Z) - Simpler, Faster, Stronger: Breaking The log-K Curse On Contrastive
Learners With FlatNCE [104.37515476361405]
We reveal mathematically why contrastive learners fail in the small-batch-size regime.
We present a novel non-native contrastive objective named FlatNCE, which fixes this issue.
arXiv Detail & Related papers (2021-07-02T15:50:43Z) - Long-Tailed Classification by Keeping the Good and Removing the Bad
Momentum Causal Effect [95.37587481952487]
Long-tailed classification is the key to deep learning at scale.
Existing methods are mainly based on re-weighting/resamplings that lack a fundamental theory.
In this paper, we establish a causal inference framework, which not only unravels the whys of previous methods, but also derives a new principled solution.
arXiv Detail & Related papers (2020-09-28T00:32:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.