G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection
via Pseudo Labelling
- URL: http://arxiv.org/abs/2009.12007v1
- Date: Fri, 25 Sep 2020 02:25:37 GMT
- Title: G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection
via Pseudo Labelling
- Authors: Souradip Chakraborty, Aritra Roy Gosthipaty, Sayak Paul
- Abstract summary: In computer vision, it is evident that deep neural networks perform better in a supervised setting with a large amount of labeled data.
In this work, we propose that, with the normalized temperature-scaled cross-entropy (NT-Xent) loss function, it is beneficial to not have images of the same category in the same batch.
We use the latent space representation of a denoising autoencoder trained on the unlabeled dataset and cluster them with k-means to obtain pseudo labels.
- Score: 0.8164433158925593
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the realms of computer vision, it is evident that deep neural networks
perform better in a supervised setting with a large amount of labeled data. The
representations learned with supervision are not only of high quality but also
helps the model in enhancing its accuracy. However, the collection and
annotation of a large dataset are costly and time-consuming. To avoid the same,
there has been a lot of research going on in the field of unsupervised visual
representation learning especially in a self-supervised setting. Amongst the
recent advancements in self-supervised methods for visual recognition, in
SimCLR Chen et al. shows that good quality representations can indeed be
learned without explicit supervision. In SimCLR, the authors maximize the
similarity of augmentations of the same image and minimize the similarity of
augmentations of different images. A linear classifier trained with the
representations learned using this approach yields 76.5% top-1 accuracy on the
ImageNet ILSVRC-2012 dataset. In this work, we propose that, with the
normalized temperature-scaled cross-entropy (NT-Xent) loss function (as used in
SimCLR), it is beneficial to not have images of the same category in the same
batch. In an unsupervised setting, the information of images pertaining to the
same category is missing. We use the latent space representation of a denoising
autoencoder trained on the unlabeled dataset and cluster them with k-means to
obtain pseudo labels. With this apriori information we batch images, where no
two images from the same category are to be found. We report comparable
performance enhancements on the CIFAR10 dataset and a subset of the ImageNet
dataset. We refer to our method as G-SimCLR.
Related papers
- Transformer-based Clipped Contrastive Quantization Learning for
Unsupervised Image Retrieval [15.982022297570108]
Unsupervised image retrieval aims to learn the important visual characteristics without any given level to retrieve the similar images for a given query image.
In this paper, we propose a TransClippedCLR model by encoding the global context of an image using Transformer having local context through patch based processing.
Results using the proposed clipped contrastive learning are greatly improved on all datasets as compared to same backbone network with vanilla contrastive learning.
arXiv Detail & Related papers (2024-01-27T09:39:11Z) - Fine-grained Recognition with Learnable Semantic Data Augmentation [68.48892326854494]
Fine-grained image recognition is a longstanding computer vision challenge.
We propose diversifying the training data at the feature-level to alleviate the discriminative region loss problem.
Our method significantly improves the generalization performance on several popular classification networks.
arXiv Detail & Related papers (2023-09-01T11:15:50Z) - SUVR: A Search-based Approach to Unsupervised Visual Representation
Learning [11.602089225841631]
We argue that image pairs should have varying degrees of similarity, and the negative samples should be allowed to be drawn from the entire dataset.
In this work, we propose Search-based Unsupervised Visual Learning (SUVR) to learn better image representations in an unsupervised manner.
arXiv Detail & Related papers (2023-05-24T05:57:58Z) - Masked Autoencoders are Robust Data Augmentors [90.34825840657774]
Regularization techniques like image augmentation are necessary for deep neural networks to generalize well.
We propose a novel perspective of augmentation to regularize the training process.
We show that utilizing such model-based nonlinear transformation as data augmentation can improve high-level recognition tasks.
arXiv Detail & Related papers (2022-06-10T02:41:48Z) - Masked Unsupervised Self-training for Zero-shot Image Classification [98.23094305347709]
Masked Unsupervised Self-Training (MUST) is a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.
MUST improves upon CLIP by a large margin and narrows the performance gap between unsupervised and supervised classification.
arXiv Detail & Related papers (2022-06-07T02:03:06Z) - Semantic-aware Dense Representation Learning for Remote Sensing Image
Change Detection [20.761672725633936]
Training deep learning-based change detection model heavily depends on labeled data.
Recent trend is using remote sensing (RS) data to obtain in-domain representations via supervised or self-supervised learning (SSL)
We propose dense semantic-aware pre-training for RS image CD via sampling multiple class-balanced points.
arXiv Detail & Related papers (2022-05-27T06:08:33Z) - AugNet: End-to-End Unsupervised Visual Representation Learning with
Image Augmentation [3.6790362352712873]
We propose AugNet, a new deep learning training paradigm to learn image features from a collection of unlabeled pictures.
Our experiments demonstrate that the method is able to represent the image in low dimensional space.
Unlike many deep-learning-based image retrieval algorithms, our approach does not require access to external annotated datasets.
arXiv Detail & Related papers (2021-06-11T09:02:30Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Self-Supervised Ranking for Representation Learning [108.38993212650577]
We present a new framework for self-supervised representation learning by formulating it as a ranking problem in an image retrieval context.
We train a representation encoder by maximizing average precision (AP) for ranking, where random views of an image are considered positively related.
In principle, by using a ranking criterion, we eliminate reliance on object-centric curated datasets.
arXiv Detail & Related papers (2020-10-14T17:24:56Z) - High-Order Information Matters: Learning Relation and Topology for
Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment.
Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.