Self-Supervised Ranking for Representation Learning
- URL: http://arxiv.org/abs/2010.07258v2
- Date: Fri, 20 Nov 2020 15:20:30 GMT
- Title: Self-Supervised Ranking for Representation Learning
- Authors: Ali Varamesh, Ali Diba, Tinne Tuytelaars, Luc Van Gool
- Abstract summary: We present a new framework for self-supervised representation learning by formulating it as a ranking problem in an image retrieval context.
We train a representation encoder by maximizing average precision (AP) for ranking, where random views of an image are considered positively related.
In principle, by using a ranking criterion, we eliminate reliance on object-centric curated datasets.
- Score: 108.38993212650577
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present a new framework for self-supervised representation learning by
formulating it as a ranking problem in an image retrieval context on a large
number of random views (augmentations) obtained from images. Our work is based
on two intuitions: first, a good representation of images must yield a
high-quality image ranking in a retrieval task; second, we would expect random
views of an image to be ranked closer to a reference view of that image than
random views of other images. Hence, we model representation learning as a
learning to rank problem for image retrieval. We train a representation encoder
by maximizing average precision (AP) for ranking, where random views of an
image are considered positively related, and that of the other images
considered negatives. The new framework, dubbed S2R2, enables computing a
global objective on multiple views, compared to the local objective in the
popular contrastive learning framework, which is calculated on pairs of views.
In principle, by using a ranking criterion, we eliminate reliance on
object-centric curated datasets. When trained on STL10 and MS-COCO, S2R2
outperforms SimCLR and the clustering-based contrastive learning model, SwAV,
while being much simpler both conceptually and at implementation. On MS-COCO,
S2R2 outperforms both SwAV and SimCLR with a larger margin than on STl10. This
indicates that S2R2 is more effective on diverse scenes and could eliminate the
need for an object-centric large training dataset for self-supervised
representation learning.
Related papers
- Image2Sentence based Asymmetrical Zero-shot Composed Image Retrieval [92.13664084464514]
The task of composed image retrieval (CIR) aims to retrieve images based on the query image and the text describing the users' intent.
Existing methods have made great progress with the advanced large vision-language (VL) model in CIR task, however, they generally suffer from two main issues: lack of labeled triplets for model training and difficulty of deployment on resource-restricted environments.
We propose Image2Sentence based Asymmetric zero-shot composed image retrieval (ISA), which takes advantage of the VL model and only relies on unlabeled images for composition learning.
arXiv Detail & Related papers (2024-03-03T07:58:03Z) - Advancing Image Retrieval with Few-Shot Learning and Relevance Feedback [5.770351255180495]
Image Retrieval with Relevance Feedback (IRRF) involves iterative human interaction during the retrieval process.
We propose a new scheme based on a hyper-network, that is tailored to the task and facilitates swift adjustment to user feedback.
We show that our method can attain SoTA results in few-shot one-class classification and reach comparable results in binary classification task of few-shot open-set recognition.
arXiv Detail & Related papers (2023-12-18T10:20:28Z) - Siamese Image Modeling for Self-Supervised Vision Representation
Learning [73.78790119050056]
Self-supervised learning (SSL) has delivered superior performance on a variety of downstream vision tasks.
Two main-stream SSL frameworks have been proposed, i.e., Instance Discrimination (ID) and Masked Image Modeling (MIM)
We propose Siamese Image Modeling (SIM), which predicts the dense representations of an augmented view.
arXiv Detail & Related papers (2022-06-02T17:59:58Z) - Mix-up Self-Supervised Learning for Contrast-agnostic Applications [33.807005669824136]
We present the first mix-up self-supervised learning framework for contrast-agnostic applications.
We address the low variance across images based on cross-domain mix-up and build the pretext task based on image reconstruction and transparency prediction.
arXiv Detail & Related papers (2022-04-02T16:58:36Z) - Object-aware Contrastive Learning for Debiased Scene Representation [74.30741492814327]
We develop a novel object-aware contrastive learning framework that localizes objects in a self-supervised manner.
We also introduce two data augmentations based on ContraCAM, object-aware random crop and background mixup, which reduce contextual and background biases during contrastive self-supervised learning.
arXiv Detail & Related papers (2021-07-30T19:24:07Z) - Scaling Up Visual and Vision-Language Representation Learning With Noisy
Text Supervision [57.031588264841]
We leverage a noisy dataset of over one billion image alt-text pairs, obtained without expensive filtering or post-processing steps.
A simple dual-encoder architecture learns to align visual and language representations of the image and text pairs using a contrastive loss.
We show that the scale of our corpus can make up for its noise and leads to state-of-the-art representations even with such a simple learning scheme.
arXiv Detail & Related papers (2021-02-11T10:08:12Z) - Towards Unsupervised Deep Image Enhancement with Generative Adversarial
Network [92.01145655155374]
We present an unsupervised image enhancement generative network (UEGAN)
It learns the corresponding image-to-image mapping from a set of images with desired characteristics in an unsupervised manner.
Results show that the proposed model effectively improves the aesthetic quality of images.
arXiv Detail & Related papers (2020-12-30T03:22:46Z) - Unsupervised Learning of Dense Visual Representations [14.329781842154281]
We propose View-Agnostic Dense Representation (VADeR) for unsupervised learning of dense representations.
VADeR learns pixelwise representations by forcing local features to remain constant over different viewing conditions.
Our method outperforms ImageNet supervised pretraining in multiple dense prediction tasks.
arXiv Detail & Related papers (2020-11-11T01:28:11Z) - G-SimCLR : Self-Supervised Contrastive Learning with Guided Projection
via Pseudo Labelling [0.8164433158925593]
In computer vision, it is evident that deep neural networks perform better in a supervised setting with a large amount of labeled data.
In this work, we propose that, with the normalized temperature-scaled cross-entropy (NT-Xent) loss function, it is beneficial to not have images of the same category in the same batch.
We use the latent space representation of a denoising autoencoder trained on the unlabeled dataset and cluster them with k-means to obtain pseudo labels.
arXiv Detail & Related papers (2020-09-25T02:25:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.