Related papers: MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

URL: http://arxiv.org/abs/2111.15340v1
Date: Tue, 30 Nov 2021 12:36:38 GMT
Title: MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning
Authors: Sara Atito, Muhammad Awais, Ammarah Farooq, Zhenhua Feng, Josef Kittler
Abstract summary: Self-supervised pretraining has shown to outperform supervised pretraining for many downstream vision applications. This superiority is attributed to the negative impact of incomplete labelling of the training images. This study investigates the possibility of modelling all the concepts present in an image without using labels.
Score: 26.942174776511237
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Self-supervised pretraining is the method of choice for natural language processing models and is rapidly gaining popularity in many vision tasks. Recently, self-supervised pretraining has shown to outperform supervised pretraining for many downstream vision applications, marking a milestone in the area. This superiority is attributed to the negative impact of incomplete labelling of the training images, which convey multiple concepts, but are annotated using a single dominant class label. Although Self-Supervised Learning (SSL), in principle, is free of this limitation, the choice of pretext task facilitating SSL is perpetuating this shortcoming by driving the learning process towards a single concept output. This study aims to investigate the possibility of modelling all the concepts present in an image without using labels. In this aspect the proposed SSL frame-work MC-SSL0.0 is a step towards Multi-Concept Self-Supervised Learning (MC-SSL) that goes beyond modelling single dominant label in an image to effectively utilise the information from all the concepts present in it. MC-SSL0.0 consists of two core design concepts, group masked model learning and learning of pseudo-concept for data token using a momentum encoder (teacher-student) framework. The experimental results on multi-label and multi-class image classification downstream tasks demonstrate that MC-SSL0.0 not only surpasses existing SSL methods but also outperforms supervised transfer learning. The source code will be made publicly available for community to train on bigger corpus.

Related papers

Self-Supervised Contrastive Learning for Multi-Label Images [0.9125467603318544]
Self-supervised learning (SSL) has demonstrated its effectiveness in learning representations through comparison methods that align with human intuition.<n>We tailor the mainstream SSL approach to guarantee excellent representation learning capabilities using fewer multi-label images.
arXiv Detail & Related papers (2025-06-29T09:29:37Z)
Interpretable Zero-shot Learning with Infinite Class Concepts [34.74107784017915]
This paper redefines class semantics in Zero-shot learning (ZSL) with a focus on transferability and discriminability.<n>We introduce a novel framework called Zero-shot Learning with Infinite Class Concepts (InfZSL)
arXiv Detail & Related papers (2025-05-06T09:30:30Z)
Scaling Language-Free Visual Representation Learning [62.31591054289958]
Visual Self-Supervised Learning (SSL) currently underperforms Contrastive Language-Image Pretraining (CLIP) in multimodal settings such as Visual Question Answering (VQA) This multimodal gap is often attributed to the semantics introduced by language supervision, even though visual SSL and CLIP models are often trained on different data. We study this question by training both visual SSL and CLIP models on the same MetaCLIP data, and leveraging VQA as a diverse testbed for vision encoders.
arXiv Detail & Related papers (2025-04-01T17:59:15Z)
Revisiting semi-supervised learning in the era of foundation models [28.414667991336067]
Semi-supervised learning (SSL) leverages abundant unlabeled data alongside limited labeled data to enhance learning. We develop new SSL benchmark datasets where frozen vision foundation models (VFMs) underperform and systematically evaluate representative SSL methods. We make a surprising observation: parameter-efficient fine-tuning (PEFT) using only labeled data often matches SSL performance, even without leveraging unlabeled data. To overcome the notorious issue of noisy pseudo-labels, we propose ensembling multiple PEFT approaches and VFM backbones to produce more robust pseudo-labels.
arXiv Detail & Related papers (2025-03-12T18:01:10Z)
Semi-Supervised Class-Agnostic Motion Prediction with Pseudo Label Regeneration and BEVMix [59.55173022987071]
We study the potential of semi-supervised learning for class-agnostic motion prediction. Our framework adopts a consistency-based self-training paradigm, enabling the model to learn from unlabeled data. Our method exhibits comparable performance to weakly and some fully supervised methods.
arXiv Detail & Related papers (2023-12-13T09:32:50Z)
GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot Learning [24.075034737719776]
This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL) We propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection. Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.
arXiv Detail & Related papers (2023-09-02T12:07:21Z)
Multi-Label Self-Supervised Learning with Scene Images [21.549234013998255]
This paper shows that quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem. The proposed method is named Multi-Label Self-supervised learning (MLS)
arXiv Detail & Related papers (2023-08-07T04:04:22Z)
Understanding and Improving the Role of Projection Head in Self-Supervised Learning [77.59320917894043]
Self-supervised learning (SSL) aims to produce useful feature representations without access to human-labeled data annotations. Current contrastive learning approaches append a parametrized projection head to the end of some backbone network to optimize the InfoNCE objective. This raises a fundamental question: Why is a learnable projection head required if we are to discard it after training?
arXiv Detail & Related papers (2022-12-22T05:42:54Z)
Self-Supervised Visual Representation Learning via Residual Momentum [15.515169550346517]
Self-supervised learning (SSL) approaches have shown promising capabilities in learning the representation from unlabeled data. momentum-based SSL frameworks suffer from a large gap in representation between the online encoder (student) and the momentum encoder (teacher) This paper is the first to investigate and identify this invisible gap as a bottleneck that has been overlooked in the existing SSL frameworks. We propose "residual momentum" to directly reduce this gap to encourage the student to learn the representation as close to that of the teacher as possible.
arXiv Detail & Related papers (2022-11-17T19:54:02Z)
Self-Supervised Learning Through Efference Copies [0.0]
Self-supervised learning (SSL) methods aim to exploit the abundance of unlabelled data for machine learning (ML) An SSL framework derived from biological first principles of embodied learning could unify the various SSL methods, help elucidate learning in the brain, and possibly improve ML.
arXiv Detail & Related papers (2022-10-17T16:19:53Z)
DUET: Cross-modal Semantic Grounding for Contrastive Zero-shot Learning [37.48292304239107]
We present a transformer-based end-to-end ZSL method named DUET. We develop a cross-modal semantic grounding network to investigate the model's capability of disentangling semantic attributes from the images. We find that DUET can often achieve state-of-the-art performance, its components are effective and its predictions are interpretable.
arXiv Detail & Related papers (2022-07-04T11:12:12Z)
Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images. MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z)
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training [50.87603616476038]
We propose a novel self-supervised framework to learn versatile visual representations on either single-centric-object or non-iconic dataset. Massive experiments show that UniVIP pre-trained on non-iconic COCO achieves state-of-the-art transfer performance. Our method can also exploit single-centric-object dataset such as ImageNet and outperforms BYOL by 2.5% with the same pre-training epochs in linear probing.
arXiv Detail & Related papers (2022-03-14T10:04:04Z)
The Role of Global Labels in Few-Shot Classification and How to Infer Them [55.64429518100676]
Few-shot learning is a central problem in meta-learning, where learners must quickly adapt to new tasks. We propose Meta Label Learning (MeLa), a novel algorithm that infers global labels and obtains robust few-shot models via standard classification.
arXiv Detail & Related papers (2021-08-09T14:07:46Z)
FREE: Feature Refinement for Generalized Zero-Shot Learning [86.41074134041394]
Generalized zero-shot learning (GZSL) has achieved significant progress, with many efforts dedicated to overcoming the problems of visual-semantic domain gap and seen-unseen bias. Most existing methods directly use feature extraction models trained on ImageNet alone, ignoring the cross-dataset bias between ImageNet and GZSL benchmarks. We propose a simple yet effective GZSL method, termed feature refinement for generalized zero-shot learning (FREE) to tackle the above problem.
arXiv Detail & Related papers (2021-07-29T08:11:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.