Self-Supervised Image Representation Learning: Transcending Masking with
Paired Image Overlay
- URL: http://arxiv.org/abs/2301.09299v1
- Date: Mon, 23 Jan 2023 07:00:04 GMT
- Title: Self-Supervised Image Representation Learning: Transcending Masking with
Paired Image Overlay
- Authors: Yinheng Li, Han Ding, Shaofei Wang
- Abstract summary: This paper proposes a novel image augmentation technique, overlaying images, which has not been widely applied in self-supervised learning.
The proposed method is evaluated using contrastive learning, a widely used self-supervised learning method that has shown solid performance in downstream tasks.
- Score: 10.715255809531268
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Self-supervised learning has become a popular approach in recent years for
its ability to learn meaningful representations without the need for data
annotation. This paper proposes a novel image augmentation technique,
overlaying images, which has not been widely applied in self-supervised
learning. This method is designed to provide better guidance for the model to
understand underlying information, resulting in more useful representations.
The proposed method is evaluated using contrastive learning, a widely used
self-supervised learning method that has shown solid performance in downstream
tasks. The results demonstrate the effectiveness of the proposed augmentation
technique in improving the performance of self-supervised models.
Related papers
- From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling [11.634154932876719]
Masked Image Modeling has emerged as a powerful self-supervised learning paradigm for visual representation learning.
We propose a prototype-driven curriculum leagrning framework that structures the learning process to progress from prototypical examples to more complex variations in the dataset.
Our findings suggest that carefully controlling the order of training examples plays a crucial role in self-supervised visual learning.
arXiv Detail & Related papers (2024-11-16T03:21:06Z) - Enhancing Large Vision Language Models with Self-Training on Image Comprehension [131.14381425260706]
We introduce Self-Training on Image (STIC), which emphasizes a self-training approach specifically for image comprehension.
First, the model self-constructs a preference for image descriptions using unlabeled images.
To further self-improve reasoning on the extracted visual information, we let the model reuse a small portion of existing instruction-tuning data.
arXiv Detail & Related papers (2024-05-30T05:53:49Z) - A Probabilistic Model Behind Self-Supervised Learning [53.64989127914936]
In self-supervised learning (SSL), representations are learned via an auxiliary task without annotated labels.
We present a generative latent variable model for self-supervised learning.
We show that several families of discriminative SSL, including contrastive methods, induce a comparable distribution over representations.
arXiv Detail & Related papers (2024-02-02T13:31:17Z) - From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning [32.18543787821028]
This paper proposes an adaptive technique of batch fusion for self-supervised contrastive learning.
It achieves state-of-the-art performance under equitable comparisons.
We suggest that the proposed method may contribute to the advancement of data-driven self-supervised learning research.
arXiv Detail & Related papers (2023-11-16T15:47:49Z) - Towards Efficient and Effective Self-Supervised Learning of Visual
Representations [41.92884427579068]
Self-supervision has emerged as a propitious method for visual representation learning.
We propose to strengthen these methods using well-posed auxiliary tasks that converge significantly faster.
The proposed method utilizes the task of rotation prediction to improve the efficiency of existing state-of-the-art methods.
arXiv Detail & Related papers (2022-10-18T13:55:25Z) - VL-LTR: Learning Class-wise Visual-Linguistic Representation for
Long-Tailed Visual Recognition [61.75391989107558]
We present a visual-linguistic long-tailed recognition framework, termed VL-LTR.
Our method can learn visual representation from images and corresponding linguistic representation from noisy class-level text descriptions.
Notably, our method achieves 77.2% overall accuracy on ImageNet-LT, which significantly outperforms the previous best method by over 17 points.
arXiv Detail & Related papers (2021-11-26T16:24:03Z) - Learning Rich Nearest Neighbor Representations from Self-supervised
Ensembles [60.97922557957857]
We provide a framework to perform self-supervised model ensembling via a novel method of learning representations directly through gradient descent at inference time.
This technique improves representation quality, as measured by k-nearest neighbors, both on the in-domain dataset and in the transfer setting.
arXiv Detail & Related papers (2021-10-19T22:24:57Z) - Co$^2$L: Contrastive Continual Learning [69.46643497220586]
Recent breakthroughs in self-supervised learning show that such algorithms learn visual representations that can be transferred better to unseen tasks.
We propose a rehearsal-based continual learning algorithm that focuses on continually learning and maintaining transferable representations.
arXiv Detail & Related papers (2021-06-28T06:14:38Z) - MEAL: Manifold Embedding-based Active Learning [0.0]
Active learning helps learning from small amounts of data by suggesting the most promising samples for labeling.
We propose a new pool-based method for active learning, which proposes promising image regions, in each acquisition step.
We find that our active learning method achieves better performance on CamVid compared to other methods, while on Cityscapes, the performance lift was negligible.
arXiv Detail & Related papers (2021-06-22T15:22:56Z) - Multi-Pretext Attention Network for Few-shot Learning with
Self-supervision [37.6064643502453]
We propose a novel augmentation-free method for self-supervised learning, which does not rely on any auxiliary sample.
Besides, we propose Multi-pretext Attention Network (MAN), which exploits a specific attention mechanism to combine the traditional augmentation-relied methods and our GC.
We evaluate our MAN extensively on miniImageNet and tieredImageNet datasets and the results demonstrate that the proposed method outperforms the state-of-the-art (SOTA) relevant methods.
arXiv Detail & Related papers (2021-03-10T10:48:37Z) - Self-supervised Co-training for Video Representation Learning [103.69904379356413]
We investigate the benefit of adding semantic-class positives to instance-based Info Noise Contrastive Estimation training.
We propose a novel self-supervised co-training scheme to improve the popular infoNCE loss.
We evaluate the quality of the learnt representation on two different downstream tasks: action recognition and video retrieval.
arXiv Detail & Related papers (2020-10-19T17:59:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.