QK Iteration: A Self-Supervised Representation Learning Algorithm for
Image Similarity
- URL: http://arxiv.org/abs/2111.07954v1
- Date: Mon, 15 Nov 2021 18:01:05 GMT
- Title: QK Iteration: A Self-Supervised Representation Learning Algorithm for
Image Similarity
- Authors: David Wu and Yunnan Wu
- Abstract summary: We present a new contrastive self-supervised representation learning algorithm in the context of Copy Detection in the 2021 Image Similarity Challenge hosted by Facebook AI Research.
Our algorithms achieved a micro-AP score of 0.3401 on the Phase 1 leaderboard, significantly improving over the baseline $mu$AP of 0.1556.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised representation learning is a fundamental problem in computer
vision with many useful applications (e.g., image search, instance level
recognition, copy detection). In this paper we present a new contrastive
self-supervised representation learning algorithm in the context of Copy
Detection in the 2021 Image Similarity Challenge hosted by Facebook AI
Research. Previous work in contrastive self-supervised learning has identified
the importance of being able to optimize representations while ``pushing''
against a large number of negative examples. Representative previous solutions
either use large batches enabled by modern distributed training systems or
maintain queues or memory banks holding recently evaluated representations
while relaxing some consistency properties. We approach this problem from a new
angle: We directly learn a query model and a key model jointly and push
representations against a very large number (e.g., 1 million) of negative
representations in each SGD step. We achieve this by freezing the backbone on
one side and by alternating between a Q-optimization step and a K-optimization
step. During the competition timeframe, our algorithms achieved a micro-AP
score of 0.3401 on the Phase 1 leaderboard, significantly improving over the
baseline $\mu$AP of 0.1556. On the final Phase 2 leaderboard, our model scored
0.1919, while the baseline scored 0.0526. Continued training yielded further
improvement. We conducted an empirical study to compare the proposed approach
with a SimCLR style strategy where the negative examples are taken from the
batch only. We found that our method ($\mu$AP of 0.3403) significantly
outperforms this SimCLR-style baseline ($\mu$AP of 0.2001).
Related papers
- $\mathbb{X}$-Sample Contrastive Loss: Improving Contrastive Learning with Sample Similarity Graphs [62.565573316667276]
We develop an objective that encodes how a sample relates to others.
We train vision models based on similarities in class or text caption descriptions.
Our objective appears to work particularly well in lower-data regimes, with gains over CLIP of $16.8%$ on ImageNet and $18.1%$ on ImageNet Real.
arXiv Detail & Related papers (2024-07-25T15:38:16Z) - Transductive Zero-Shot and Few-Shot CLIP [24.592841797020203]
This paper addresses the transductive zero-shot and few-shot CLIP classification challenge.
Inference is performed jointly across a mini-batch of unlabeled query samples, rather than treating each instance independently.
Our approach yields near 20% improvement in ImageNet accuracy over CLIP's zero-shot performance.
arXiv Detail & Related papers (2024-04-08T12:44:31Z) - CUCL: Codebook for Unsupervised Continual Learning [129.91731617718781]
The focus of this study is on Unsupervised Continual Learning (UCL), as it presents an alternative to Supervised Continual Learning.
We propose a method named Codebook for Unsupervised Continual Learning (CUCL) which promotes the model to learn discriminative features to complete the class boundary.
Our method significantly boosts the performances of supervised and unsupervised methods.
arXiv Detail & Related papers (2023-11-25T03:08:50Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - Semantic Positive Pairs for Enhancing Visual Representation Learning of Instance Discrimination methods [4.680881326162484]
Self-supervised learning algorithms (SSL) based on instance discrimination have shown promising results.
We propose an approach to identify those images with similar semantic content and treat them as positive instances.
We run experiments on three benchmark datasets: ImageNet, STL-10 and CIFAR-10 with different instance discrimination SSL approaches.
arXiv Detail & Related papers (2023-06-28T11:47:08Z) - Weakly Supervised Contrastive Learning [68.47096022526927]
We introduce a weakly supervised contrastive learning framework (WCL) to tackle this issue.
WCL achieves 65% and 72% ImageNet Top-1 Accuracy using ResNet50, which is even higher than SimCLRv2 with ResNet101.
arXiv Detail & Related papers (2021-10-10T12:03:52Z) - Improving Contrastive Learning by Visualizing Feature Transformation [37.548120912055595]
In this paper, we attempt to devise a feature-level data manipulation, differing from data augmentation, to enhance the generic contrastive self-supervised learning.
We first design a visualization scheme for pos/neg score (Pos/neg score indicates similarity of pos/neg pair.) distribution, which enables us to analyze, interpret and understand the learning process.
Experiment results show that our proposed Feature Transformation can improve at least 6.0% accuracy on ImageNet-100 over MoCo baseline, and about 2.0% accuracy on ImageNet-1K over the MoCoV2 baseline.
arXiv Detail & Related papers (2021-08-06T07:26:08Z) - Beyond Single Instance Multi-view Unsupervised Representation Learning [21.449132256091662]
We impose more accurate instance discrimination capability by measuring the joint similarity between two randomly sampled instances.
We believe that learning joint similarity helps to improve the performance when encoded features are distributed more evenly in the latent space.
arXiv Detail & Related papers (2020-11-26T15:43:27Z) - Unsupervised Learning of Visual Features by Contrasting Cluster
Assignments [57.33699905852397]
We propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons.
Our method simultaneously clusters the data while enforcing consistency between cluster assignments.
Our method can be trained with large and small batches and can scale to unlimited amounts of data.
arXiv Detail & Related papers (2020-06-17T14:00:42Z) - SCAN: Learning to Classify Images without Labels [73.69513783788622]
We advocate a two-step approach where feature learning and clustering are decoupled.
A self-supervised task from representation learning is employed to obtain semantically meaningful features.
We obtain promising results on ImageNet, and outperform several semi-supervised learning methods in the low-data regime.
arXiv Detail & Related papers (2020-05-25T18:12:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.