Related papers: Self-Supervised Image Representation Learning: Transcending Masking with Paired Image Overlay

Self-Supervised Image Representation Learning: Transcending Masking with Paired Image Overlay

URL: http://arxiv.org/abs/2301.09299v1
Date: Mon, 23 Jan 2023 07:00:04 GMT
Title: Self-Supervised Image Representation Learning: Transcending Masking with Paired Image Overlay
Authors: Yinheng Li, Han Ding, Shaofei Wang
Abstract summary: This paper proposes a novel image augmentation technique, overlaying images, which has not been widely applied in self-supervised learning. The proposed method is evaluated using contrastive learning, a widely used self-supervised learning method that has shown solid performance in downstream tasks.
Score: 10.715255809531268
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Self-supervised learning has become a popular approach in recent years for its ability to learn meaningful representations without the need for data annotation. This paper proposes a novel image augmentation technique, overlaying images, which has not been widely applied in self-supervised learning. This method is designed to provide better guidance for the model to understand underlying information, resulting in more useful representations. The proposed method is evaluated using contrastive learning, a widely used self-supervised learning method that has shown solid performance in downstream tasks. The results demonstrate the effectiveness of the proposed augmentation technique in improving the performance of self-supervised models.

Related papers

From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling [11.634154932876719]
Masked Image Modeling has emerged as a powerful self-supervised learning paradigm for visual representation learning. We propose a prototype-driven curriculum leagrning framework that structures the learning process to progress from prototypical examples to more complex variations in the dataset. Our findings suggest that carefully controlling the order of training examples plays a crucial role in self-supervised visual learning.
arXiv Detail & Related papers (2024-11-16T03:21:06Z)
Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension. First, the model self-constructs a preference for image descriptions using unlabeled images. To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z)
A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels. We present a generative latent variable model for self-supervised learning. We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z)
From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning [32.18543787821028]
This paper proposes an adaptive technique of batch fusion for self-supervised contrastive learning. It achieves state-of-the-art performance under equitable comparisons. We suggest that the proposed method may contribute to the advancement of data-driven self-supervised learning research.
arXiv Detail & Related papers (2023-11-16T15:47:49Z)
Towards Efficient and Effective Self-Supervised Learning of Visual Representations [41.92884427579068]
Self-supervision has emerged as a propitious method for visual representation learning. We propose to strengthen these methods using well-posed auxiliary tasks that converge significantly faster. The proposed method utilizes the task of rotation prediction to improve the efficiency of existing state-of-the-art methods.
arXiv Detail & Related papers (2022-10-18T13:55:25Z)
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition [61.75391989107558]
We present a visual-linguistic long-tailed recognition framework, termed VL-LTR. Our method can learn visual representation from images and corresponding linguistic representation from noisy class-level text descriptions. Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points.
arXiv Detail & Related papers (2021-11-26T16:24:03Z)
Learning Rich Nearest Neighbor Representations from Self-supervised Ensembles [60.97922557957857]
We provide a framework to perform self-supervised model ensembling via a novel method of learning representations directly through gradient descent at inference time. This technique improves representation quality, as measured by k-nearest neighbors, both on the in-domain dataset and in the transfer setting.
arXiv Detail & Related papers (2021-10-19T22:24:57Z)
Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks. We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z)
MEAL: Manifold Embedding-based Active Learning [0.0]
Active learning helps learning from small amounts of data by suggesting the most promising samples for labeling. We propose a new pool-based method for active learning, which proposes promising image regions, in each acquisition step. We find that our active learning method achieves better performance on CamVid compared to other methods, while on Cityscapes, the performance lift was negligible.
arXiv Detail & Related papers (2021-06-22T15:22:56Z)
Multi-Pretext Attention Network for Few-shot Learning with Self-supervision [37.6064643502453]
We propose a novel augmentation-free method for self-supervised learning, which does not rely on any auxiliary sample. Besides, we propose Multi-pretext Attention Network (MAN), which exploits a specific attention mechanism to combine the traditional augmentation-relied methods and our GC. We evaluate our MAN extensively on miniImageNet and tieredImageNet datasets and the results demonstrate that the proposed method outperforms the state-of-the-art (SOTA) relevant methods.
arXiv Detail & Related papers (2021-03-10T10:48:37Z)
Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training. We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss. We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.