Self-Supervised Contrastive Learning for Multi-Label Images
- URL: http://arxiv.org/abs/2506.23156v1
- Date: Sun, 29 Jun 2025 09:29:37 GMT
- Title: Self-Supervised Contrastive Learning for Multi-Label Images
- Authors: Jiale Chen,
- Abstract summary: Self-supervised learning (SSL) has demonstrated its effectiveness in learning representations through comparison methods that align with human intuition.<n>We tailor the mainstream SSL approach to guarantee excellent representation learning capabilities using fewer multi-label images.
- Score: 0.9125467603318544
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-supervised learning (SSL) has demonstrated its effectiveness in learning representations through comparison methods that align with human intuition. However, mainstream SSL methods heavily rely on high body datasets with single label, such as ImageNet, resulting in intolerable pre-training overhead. Besides, more general multi-label images are frequently overlooked in SSL, despite their potential for richer semantic information and broader applicability in downstream scenarios. Therefore, we tailor the mainstream SSL approach to guarantee excellent representation learning capabilities using fewer multi-label images. Firstly, we propose a block-wise augmentation module aimed at extracting additional potential positive view pairs from multi-label images. Subsequently, an image-aware contrastive loss is devised to establish connections between these views, thereby facilitating the extraction of semantically consistent representations. Comprehensive linear fine-tuning and transfer learning validate the competitiveness of our approach despite challenging sample quality and quantity.
Related papers
- Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks.<n>MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization.<n>MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z) - Context-Based Semantic-Aware Alignment for Semi-Supervised Multi-Label Learning [37.13424985128905]
Vision-language models pre-trained on large-scale image-text pairs could alleviate the challenge of limited labeled data under SSMLL setting.<n>We propose a context-based semantic-aware alignment method to solve the SSMLL problem.
arXiv Detail & Related papers (2024-12-25T09:06:54Z) - Multi-Label Self-Supervised Learning with Scene Images [21.549234013998255]
This paper shows that quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem.
The proposed method is named Multi-Label Self-supervised learning (MLS)
arXiv Detail & Related papers (2023-08-07T04:04:22Z) - Zero-Shot Learning by Harnessing Adversarial Samples [52.09717785644816]
We propose a novel Zero-Shot Learning (ZSL) approach by Harnessing Adversarial Samples (HAS)
HAS advances ZSL through adversarial training which takes into account three crucial aspects.
We demonstrate the effectiveness of our adversarial samples approach in both ZSL and Generalized Zero-Shot Learning (GZSL) scenarios.
arXiv Detail & Related papers (2023-08-01T06:19:13Z) - Localized Region Contrast for Enhancing Self-Supervised Learning in
Medical Image Segmentation [27.82940072548603]
We propose a novel contrastive learning framework that integrates Localized Region Contrast (LRC) to enhance existing self-supervised pre-training methods for medical image segmentation.
Our approach involves identifying Super-pixels by Felzenszwalb's algorithm and performing local contrastive learning using a novel contrastive sampling loss.
arXiv Detail & Related papers (2023-04-06T22:43:13Z) - Self-Supervised Pyramid Representation Learning for Multi-Label Visual
Analysis and Beyond [31.36818611460614]
We propose a Self-Supervised Pyramid Learning (SS-PRL) framework.
The proposed SS-PRL is designed to derive pyramid representations at patch levels via learning proper prototypes.
We show that, with our proposed SS-PRL for model pre-training, one can easily adapt and fine-tune the models for a variety of applications.
arXiv Detail & Related papers (2022-08-30T17:57:14Z) - Object-Aware Self-supervised Multi-Label Learning [9.496981642855769]
We propose an Object-Aware Self-Supervision (OASS) method to obtain more fine-grained representations for multi-label learning.
The proposed method can be leveraged to efficiently generate Class-Specific Instances (CSI) in a proposal-free fashion.
Experiments on the VOC2012 dataset for multi-label classification demonstrate the effectiveness of the proposed method against the state-of-the-art counterparts.
arXiv Detail & Related papers (2022-05-14T10:14:08Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - Object discovery and representation networks [78.16003886427885]
We propose a self-supervised learning paradigm that discovers the structure encoded in priors by itself.
Our method, Odin, couples object discovery and representation networks to discover meaningful image segmentations without any supervision.
arXiv Detail & Related papers (2022-03-16T17:42:55Z) - Dense Contrastive Visual-Linguistic Pretraining [53.61233531733243]
Several multimodal representation learning approaches have been proposed that jointly represent image and text.
These approaches achieve superior performance by capturing high-level semantic information from large-scale multimodal pretraining.
We propose unbiased Dense Contrastive Visual-Linguistic Pretraining to replace the region regression and classification with cross-modality region contrastive learning.
arXiv Detail & Related papers (2021-09-24T07:20:13Z) - Multi-Label Image Classification with Contrastive Learning [57.47567461616912]
We show that a direct application of contrastive learning can hardly improve in multi-label cases.
We propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting.
arXiv Detail & Related papers (2021-07-24T15:00:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.