Measuring Self-Supervised Representation Quality for Downstream
Classification using Discriminative Features
- URL: http://arxiv.org/abs/2203.01881v6
- Date: Tue, 12 Dec 2023 22:56:44 GMT
- Title: Measuring Self-Supervised Representation Quality for Downstream
Classification using Discriminative Features
- Authors: Neha Kalibhat, Kanika Narang, Hamed Firooz, Maziar Sanjabi, Soheil
Feizi
- Abstract summary: We study the representation space of state-of-the-art self-supervised models including SimCLR, SwaV, MoCo, BYOL, DINO, SimSiam, VICReg and Barlow Twins.
We propose Self-Supervised Representation Quality Score (or Q-Score), an unsupervised score that can reliably predict if a given sample is likely to be mis-classified.
Fine-tuning with Q-Score regularization can boost the linear probing accuracy of SSL models by up to 5.8% on ImageNet-100 and 3.7% on ImageNet-1K.
- Score: 56.89813105411331
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Self-supervised learning (SSL) has shown impressive results in downstream
classification tasks. However, there is limited work in understanding their
failure modes and interpreting their learned representations. In this paper, we
study the representation space of state-of-the-art self-supervised models
including SimCLR, SwaV, MoCo, BYOL, DINO, SimSiam, VICReg and Barlow Twins.
Without the use of class label information, we discover discriminative features
that correspond to unique physical attributes in images, present mostly in
correctly-classified representations. Using these features, we can compress the
representation space by up to 40% without significantly affecting linear
classification performance. We then propose Self-Supervised Representation
Quality Score (or Q-Score), an unsupervised score that can reliably predict if
a given sample is likely to be mis-classified during linear evaluation,
achieving AUPRC of 91.45 on ImageNet-100 and 78.78 on ImageNet-1K. Q-Score can
also be used as a regularization term on pre-trained encoders to remedy
low-quality representations. Fine-tuning with Q-Score regularization can boost
the linear probing accuracy of SSL models by up to 5.8% on ImageNet-100 and
3.7% on ImageNet-1K compared to their baselines. Finally, using gradient
heatmaps and Salient ImageNet masks, we define a metric to quantify the
interpretability of each representation. We show that discriminative features
are strongly correlated to core attributes and, enhancing these features
through Q-score regularization makes SSL representations more interpretable.
Related papers
- A Self-Supervised Learning Pipeline for Demographically Fair Facial Attribute Classification [3.5092955099876266]
This paper proposes a fully self-supervised pipeline for demographically fair facial attribute classification.
We leverage completely unlabeled data pseudolabeled via pre-trained encoders, diverse data curation techniques, and meta-learning-based weighted contrastive learning.
arXiv Detail & Related papers (2024-07-14T07:11:57Z) - SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference [11.453253140479166]
We enhance contrastive language-image pretraining's potential for semantic segmentation.
By rethinking self-attention, we find that CLIP can adapt to dense prediction tasks.
We replace the traditional self-attention block of CLIP vision encoder's last layer by our CSA module.
arXiv Detail & Related papers (2023-12-04T03:18:46Z) - Improving Self-Supervised Learning by Characterizing Idealized
Representations [155.1457170539049]
We prove necessary and sufficient conditions for any task invariant to given data augmentations.
For contrastive learning, our framework prescribes simple but significant improvements to previous methods.
For non-contrastive learning, we use our framework to derive a simple and novel objective.
arXiv Detail & Related papers (2022-09-13T18:01:03Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Shaping Visual Representations with Attributes for Few-Shot Learning [5.861206243996454]
Few-shot recognition aims to recognize novel categories under low-data regimes.
Recent metric-learning based few-shot learning methods have achieved promising performances.
We propose attribute-shaped learning (ASL), which can normalize visual representations to predict attributes for query images.
arXiv Detail & Related papers (2021-12-13T03:16:19Z) - Image Quality Assessment using Contrastive Learning [50.265638572116984]
We train a deep Convolutional Neural Network (CNN) using a contrastive pairwise objective to solve the auxiliary problem.
We show through extensive experiments that CONTRIQUE achieves competitive performance when compared to state-of-the-art NR image quality models.
Our results suggest that powerful quality representations with perceptual relevance can be obtained without requiring large labeled subjective image quality datasets.
arXiv Detail & Related papers (2021-10-25T21:01:00Z) - Calibrating Class Activation Maps for Long-Tailed Visual Recognition [60.77124328049557]
We present two effective modifications of CNNs to improve network learning from long-tailed distribution.
First, we present a Class Activation Map (CAMC) module to improve the learning and prediction of network classifiers.
Second, we investigate the use of normalized classifiers for representation learning in long-tailed problems.
arXiv Detail & Related papers (2021-08-29T05:45:03Z) - VOLO: Vision Outlooker for Visual Recognition [148.12522298731807]
Vision transformers (ViTs) have shown great potential of self-attention based models in ImageNet classification.
We introduce a novel outlook attention and present a simple and general architecture, termed Vision Outlooker (VOLO)
Unlike self-attention that focuses on global dependency modeling at a coarse level, the outlook attention efficiently encodes finer-level features and contexts into tokens.
Experiments show that our VOLO achieves 87.1% top-1 accuracy on ImageNet-1K classification, which is the first model exceeding 87% accuracy on this competitive benchmark.
arXiv Detail & Related papers (2021-06-24T15:46:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.