Related papers: Multi-Label Self-Supervised Learning with Scene Images

Multi-Label Self-Supervised Learning with Scene Images

URL: http://arxiv.org/abs/2308.03286v3
Date: Fri, 29 Sep 2023 03:58:40 GMT
Title: Multi-Label Self-Supervised Learning with Scene Images
Authors: Ke Zhu and Minghao Fu and Jianxin Wu
Abstract summary: This paper shows that quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem. The proposed method is named Multi-Label Self-supervised learning (MLS)
Score: 21.549234013998255
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Self-supervised learning (SSL) methods targeting scene images have seen a rapid growth recently, and they mostly rely on either a dedicated dense matching mechanism or a costly unsupervised object discovery module. This paper shows that instead of hinging on these strenuous operations, quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem, which greatly simplifies the learning framework. Specifically, multiple binary pseudo-labels are assigned for each input image by comparing its embeddings with those in two dictionaries, and the network is optimized using the binary cross entropy loss. The proposed method is named Multi-Label Self-supervised learning (MLS). Visualizations qualitatively show that clearly the pseudo-labels by MLS can automatically find semantically similar pseudo-positive pairs across different images to facilitate contrastive learning. MLS learns high quality representations on MS-COCO and achieves state-of-the-art results on classification, detection and segmentation benchmarks. At the same time, MLS is much simpler than existing methods, making it easier to deploy and for further exploration.

Related papers

Semantic-guided Representation Learning for Multi-Label Recognition [13.046479112800608]
Multi-label Recognition (MLR) involves assigning multiple labels to each data instance in an image. Recent Vision and Language Pre-training methods have made significant progress in tackling zero-shot MLR tasks. We introduce a Semantic-guided Representation Learning approach (SigRL) that enables the model to learn effective visual and textual representations.
arXiv Detail & Related papers (2025-04-04T08:15:08Z)
Masked Cross-image Encoding for Few-shot Segmentation [16.445813548503708]
Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images. We propose a joint learning method termed Masked Cross-Image MCE, which is designed to capture common visual properties that describe object details and to learn bidirectional inter-image dependencies that enhance feature interaction.
arXiv Detail & Related papers (2023-08-22T05:36:39Z)
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions [69.01985134519244]
Vision-language models, such as contrastive language-image pre-training (CLIP), have demonstrated impressive results in natural image domains. We propose S-CLIP, a semi-supervised learning method for training CLIP that utilizes additional unpaired images. S-CLIP improves CLIP by 10% for zero-shot classification and 4% for image-text retrieval on the remote sensing benchmark.
arXiv Detail & Related papers (2023-05-23T14:18:11Z)
Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding. We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z)
Dual-Perspective Semantic-Aware Representation Blending for Multi-Label Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images. The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z)
Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS) SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels. Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z)
Multi-label Iterated Learning for Image Classification with Label Ambiguity [3.5736176624479654]
We propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels. MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions. We show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision.
arXiv Detail & Related papers (2021-11-23T22:10:00Z)
Multi-Label Image Classification with Contrastive Learning [57.47567461616912]
We show that a direct application of contrastive learning can hardly improve in multi-label cases. We propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting.
arXiv Detail & Related papers (2021-07-24T15:00:47Z)
Semantic Diversity Learning for Zero-Shot Multi-label Classification [14.480713752871523]
This study introduces an end-to-end model training for multi-label zero-shot learning. We propose to use an embedding matrix having principal embedding vectors trained using a tailored loss function. In addition, during training, we suggest up-weighting in the loss function image samples presenting higher semantic diversity to encourage the diversity of the embedding matrix.
arXiv Detail & Related papers (2021-05-12T19:39:07Z)
Seed the Views: Hierarchical Semantic Alignment for Contrastive Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation. Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z)
SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive Person Re-Identification [25.96221714337815]
Domain adaptive person re-identification (re-ID) is a challenging task due to the large discrepancy between the source domain and the target domain. Existing methods mainly attempt to generate pseudo labels for unlabeled target images by clustering algorithms. We propose a Self-Supervised Knowledge Distillation (SSKD) technique containing two modules, the identity learning and the soft label learning.
arXiv Detail & Related papers (2020-09-13T10:12:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.