Investigating Self-Supervised Methods for Label-Efficient Learning
- URL: http://arxiv.org/abs/2406.17460v1
- Date: Tue, 25 Jun 2024 10:56:03 GMT
- Title: Investigating Self-Supervised Methods for Label-Efficient Learning
- Authors: Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammad Awais,
- Abstract summary: We study different self supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling for their low-shot capabilities.
We introduce a framework involving both mask image modelling and clustering as pretext tasks, which performs better across all low-shot downstream tasks.
When testing the model on full scale datasets, we show performance gains in multi-class classification, multi-label classification and semantic segmentation.
- Score: 27.029542823306866
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks like classification, segmentation and detection. The low-shot learning capability of these models, across several low-shot downstream tasks, has been largely under explored. We perform a system level study of different self supervised pretext tasks, namely contrastive learning, clustering, and masked image modelling for their low-shot capabilities by comparing the pretrained models. In addition we also study the effects of collapse avoidance methods, namely centring, ME-MAX, sinkhorn, on these downstream tasks. Based on our detailed analysis, we introduce a framework involving both mask image modelling and clustering as pretext tasks, which performs better across all low-shot downstream tasks, including multi-class classification, multi-label classification and semantic segmentation. Furthermore, when testing the model on full scale datasets, we show performance gains in multi-class classification, multi-label classification and semantic segmentation.
Related papers
- Hierarchical Multi-Label Classification with Missing Information for Benthic Habitat Imagery [1.6492989697868894]
We show the capacity to conduct HML training in scenarios where there exist multiple levels of missing annotation information.
We find that, when using smaller one-hot image label datasets typical of local or regional scale benthic science projects, models pre-trained with self-supervision on a larger collection of in-domain benthic data outperform models pre-trained on ImageNet.
arXiv Detail & Related papers (2024-09-10T16:15:01Z) - Masked Image Modeling: A Survey [73.21154550957898]
Masked image modeling emerged as a powerful self-supervised learning technique in computer vision.
We construct a taxonomy and review the most prominent papers in recent years.
We aggregate the performance results of various masked image modeling methods on the most popular datasets.
arXiv Detail & Related papers (2024-08-13T07:27:02Z) - Reinforcing Pre-trained Models Using Counterfactual Images [54.26310919385808]
This paper proposes a novel framework to reinforce classification models using language-guided generated counterfactual images.
We identify model weaknesses by testing the model using the counterfactual image dataset.
We employ the counterfactual images as an augmented dataset to fine-tune and reinforce the classification model.
arXiv Detail & Related papers (2024-06-19T08:07:14Z) - Self-Supervised Open-Ended Classification with Small Visual Language
Models [60.23212389067007]
We present Self-Context Adaptation (SeCAt), a self-supervised approach that unlocks few-shot abilities for open-ended classification with small visual language models.
By using models with approximately 1B parameters we outperform the few-shot abilities of much larger models, such as Frozen and FROMAGe.
arXiv Detail & Related papers (2023-09-30T21:41:21Z) - Diffusion Models Beat GANs on Image Classification [37.70821298392606]
Diffusion models have risen to prominence as a state-of-the-art method for image generation, denoising, inpainting, super-resolution, manipulation, etc.
We present our findings that these embeddings are useful beyond the noise prediction task, as they contain discriminative information and can also be leveraged for classification.
We find that with careful feature selection and pooling, diffusion models outperform comparable generative-discriminative methods for classification tasks.
arXiv Detail & Related papers (2023-07-17T17:59:40Z) - An Efficient General-Purpose Modular Vision Model via Multi-Task
Heterogeneous Training [79.78201886156513]
We present a model that can perform multiple vision tasks and can be adapted to other downstream tasks efficiently.
Our approach achieves comparable results to single-task state-of-the-art models and demonstrates strong generalization on downstream tasks.
arXiv Detail & Related papers (2023-06-29T17:59:57Z) - A Study on Representation Transfer for Few-Shot Learning [5.717951523323085]
Few-shot classification aims to learn to classify new object categories well using only a few labeled examples.
In this work we perform a systematic study of various feature representations for few-shot classification.
We find that learning from more complex tasks tend to give better representations for few-shot classification.
arXiv Detail & Related papers (2022-09-05T17:56:02Z) - Semi-Supervised Few-Shot Classification with Deep Invertible Hybrid
Models [4.189643331553922]
We propose a deep invertible hybrid model which integrates discriminative and generative learning at a latent space level for semi-supervised few-shot classification.
Our main originality lies in our integration of these components at a latent space level, which is effective in preventing overfitting.
arXiv Detail & Related papers (2021-05-22T05:55:16Z) - Few-Shot Image Classification via Contrastive Self-Supervised Learning [5.878021051195956]
We propose a new paradigm of unsupervised few-shot learning to repair the deficiencies.
We solve the few-shot tasks in two phases: meta-training a transferable feature extractor via contrastive self-supervised learning.
Our method achieves state of-the-art performance in a variety of established few-shot tasks on the standard few-shot visual classification datasets.
arXiv Detail & Related papers (2020-08-23T02:24:31Z) - Overcoming Classifier Imbalance for Long-tail Object Detection with
Balanced Group Softmax [88.11979569564427]
We provide the first systematic analysis on the underperformance of state-of-the-art models in front of long-tail distribution.
We propose a novel balanced group softmax (BAGS) module for balancing the classifiers within the detection frameworks through group-wise training.
Extensive experiments on the very recent long-tail large vocabulary object recognition benchmark LVIS show that our proposed BAGS significantly improves the performance of detectors.
arXiv Detail & Related papers (2020-06-18T10:24:26Z) - UniT: Unified Knowledge Transfer for Any-shot Object Detection and
Segmentation [52.487469544343305]
Methods for object detection and segmentation rely on large scale instance-level annotations for training.
We propose an intuitive and unified semi-supervised model that is applicable to a range of supervision.
arXiv Detail & Related papers (2020-06-12T22:45:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.