Multi-Label Self-Supervised Learning with Scene Images
- URL: http://arxiv.org/abs/2308.03286v3
- Date: Fri, 29 Sep 2023 03:58:40 GMT
- Title: Multi-Label Self-Supervised Learning with Scene Images
- Authors: Ke Zhu and Minghao Fu and Jianxin Wu
- Abstract summary: This paper shows that quality image representations can be learned by treating scene/multi-label image SSL simply as a multi-label classification problem.
The proposed method is named Multi-Label Self-supervised learning (MLS)
- Score: 21.549234013998255
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning (SSL) methods targeting scene images have seen a
rapid growth recently, and they mostly rely on either a dedicated dense
matching mechanism or a costly unsupervised object discovery module. This paper
shows that instead of hinging on these strenuous operations, quality image
representations can be learned by treating scene/multi-label image SSL simply
as a multi-label classification problem, which greatly simplifies the learning
framework. Specifically, multiple binary pseudo-labels are assigned for each
input image by comparing its embeddings with those in two dictionaries, and the
network is optimized using the binary cross entropy loss. The proposed method
is named Multi-Label Self-supervised learning (MLS). Visualizations
qualitatively show that clearly the pseudo-labels by MLS can automatically find
semantically similar pseudo-positive pairs across different images to
facilitate contrastive learning. MLS learns high quality representations on
MS-COCO and achieves state-of-the-art results on classification, detection and
segmentation benchmarks. At the same time, MLS is much simpler than existing
methods, making it easier to deploy and for further exploration.
Related papers
- Masked Cross-image Encoding for Few-shot Segmentation [16.445813548503708]
Few-shot segmentation (FSS) is a dense prediction task that aims to infer the pixel-wise labels of unseen classes using only a limited number of annotated images.
We propose a joint learning method termed Masked Cross-Image MCE, which is designed to capture common visual properties that describe object details and to learn bidirectional inter-image dependencies that enhance feature interaction.
arXiv Detail & Related papers (2023-08-22T05:36:39Z) - S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist
Captions [69.01985134519244]
Vision-language models, such as contrastive language-image pre-training (CLIP), have demonstrated impressive results in natural image domains.
We propose S-CLIP, a semi-supervised learning method for training CLIP that utilizes additional unpaired images.
S-CLIP improves CLIP by 10% for zero-shot classification and 4% for image-text retrieval on the remote sensing benchmark.
arXiv Detail & Related papers (2023-05-23T14:18:11Z) - Open-Vocabulary Multi-Label Classification via Multi-modal Knowledge
Transfer [55.885555581039895]
Multi-label zero-shot learning (ML-ZSL) focuses on transferring knowledge by a pre-trained textual label embedding.
We propose a novel open-vocabulary framework, named multimodal knowledge transfer (MKT) for multi-label classification.
arXiv Detail & Related papers (2022-07-05T08:32:18Z) - Dual-Perspective Semantic-Aware Representation Blending for Multi-Label
Image Recognition with Partial Labels [70.36722026729859]
We propose a dual-perspective semantic-aware representation blending (DSRB) that blends multi-granularity category-specific semantic representation across different images.
The proposed DS consistently outperforms current state-of-the-art algorithms on all proportion label settings.
arXiv Detail & Related papers (2022-05-26T00:33:44Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - Multi-label Iterated Learning for Image Classification with Label
Ambiguity [3.5736176624479654]
We propose multi-label iterated learning (MILe) to incorporate the inductive biases of multi-label learning from single labels.
MILe is a simple yet effective procedure that builds a multi-label description of the image by propagating binary predictions.
We show that MILe is effective reducing label noise, achieving state-of-the-art performance on real-world large-scale noisy data such as WebVision.
arXiv Detail & Related papers (2021-11-23T22:10:00Z) - Multi-Label Image Classification with Contrastive Learning [57.47567461616912]
We show that a direct application of contrastive learning can hardly improve in multi-label cases.
We propose a novel framework for multi-label classification with contrastive learning in a fully supervised setting.
arXiv Detail & Related papers (2021-07-24T15:00:47Z) - Semantic Diversity Learning for Zero-Shot Multi-label Classification [14.480713752871523]
This study introduces an end-to-end model training for multi-label zero-shot learning.
We propose to use an embedding matrix having principal embedding vectors trained using a tailored loss function.
In addition, during training, we suggest up-weighting in the loss function image samples presenting higher semantic diversity to encourage the diversity of the embedding matrix.
arXiv Detail & Related papers (2021-05-12T19:39:07Z) - Seed the Views: Hierarchical Semantic Alignment for Contrastive
Representation Learning [116.91819311885166]
We propose a hierarchical semantic alignment strategy via expanding the views generated by a single image to textbfCross-samples and Multi-level representation.
Our method, termed as CsMl, has the ability to integrate multi-level visual representations across samples in a robust way.
arXiv Detail & Related papers (2020-12-04T17:26:24Z) - SSKD: Self-Supervised Knowledge Distillation for Cross Domain Adaptive
Person Re-Identification [25.96221714337815]
Domain adaptive person re-identification (re-ID) is a challenging task due to the large discrepancy between the source domain and the target domain.
Existing methods mainly attempt to generate pseudo labels for unlabeled target images by clustering algorithms.
We propose a Self-Supervised Knowledge Distillation (SSKD) technique containing two modules, the identity learning and the soft label learning.
arXiv Detail & Related papers (2020-09-13T10:12:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.