EqCo: Equivalent Rules for Self-supervised Contrastive Learning
- URL: http://arxiv.org/abs/2010.01929v3
- Date: Mon, 15 Mar 2021 04:53:58 GMT
- Title: EqCo: Equivalent Rules for Self-supervised Contrastive Learning
- Authors: Benjin Zhu, Junqiang Huang, Zeming Li, Xiangyu Zhang, Jian Sun
- Abstract summary: We propose a method to make self-supervised learning irrelevant to the number of negative samples in InfoNCE-based contrastive learning frameworks.
Inspired by the InfoMax principle, we point that the margin term in contrastive loss needs to be adaptively scaled according to the number of negative pairs.
- Score: 81.45848885547754
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose a method, named EqCo (Equivalent Rules for
Contrastive Learning), to make self-supervised learning irrelevant to the
number of negative samples in InfoNCE-based contrastive learning frameworks.
Inspired by the InfoMax principle, we point that the margin term in contrastive
loss needs to be adaptively scaled according to the number of negative pairs in
order to keep steady mutual information bound and gradient magnitude. EqCo
bridges the performance gap among a wide range of negative sample sizes, so
that we can use only a few negative pairs (e.g. 16 per query) to perform
self-supervised contrastive training on large-scale vision datasets like
ImageNet, while with almost no accuracy drop. This is quite a contrast to the
widely used large batch training or memory bank mechanism in current practices.
Equipped with EqCo, our simplified MoCo (SiMo) achieves comparable accuracy
with MoCo v2 on ImageNet (linear evaluation protocol) while only involves 4
negative pairs per query instead of 65536, suggesting that large quantities of
negative samples might not be a critical factor in InfoNCE loss.
Related papers
- CoNe: Contrast Your Neighbours for Supervised Image Classification [62.12074282211957]
Contrast Your Neighbours (CoNe) is a learning framework for supervised image classification.
CoNe employs the features of its similar neighbors as anchors to generate more adaptive and refined targets.
Our CoNe achieves 80.8% Top-1 accuracy on ImageNet with ResNet-50, which surpasses the recent Timm training recipe.
arXiv Detail & Related papers (2023-08-21T14:49:37Z) - Relation-Aware Network with Attention-Based Loss for Few-Shot Knowledge
Graph Completion [9.181270251524866]
Current approaches randomly select one negative sample for each reference entity pair to minimize a margin-based ranking loss.
We propose a novel Relation-Aware Network with Attention-Based Loss framework.
Experiments demonstrate that RANA outperforms the state-of-the-art models on two benchmark datasets.
arXiv Detail & Related papers (2023-06-15T21:41:43Z) - Unsupervised Dense Retrieval with Relevance-Aware Contrastive
Pre-Training [81.3781338418574]
We propose relevance-aware contrastive learning.
We consistently improve the SOTA unsupervised Contriever model on the BEIR and open-domain QA retrieval benchmarks.
Our method can not only beat BM25 after further pre-training on the target corpus but also serves as a good few-shot learner.
arXiv Detail & Related papers (2023-06-05T18:20:27Z) - Siamese Prototypical Contrastive Learning [24.794022951873156]
Contrastive Self-supervised Learning (CSL) is a practical solution that learns meaningful visual representations from massive data in an unsupervised approach.
In this paper, we tackle this problem by introducing a simple but effective contrastive learning framework.
The key insight is to employ siamese-style metric loss to match intra-prototype features, while increasing the distance between inter-prototype features.
arXiv Detail & Related papers (2022-08-18T13:25:30Z) - Positive-Negative Equal Contrastive Loss for Semantic Segmentation [8.664491798389662]
Previous works commonly design plug-and-play modules and structural losses to effectively extract and aggregate the global context.
We propose Positive-Negative Equal contrastive loss (PNE loss), which increases the latent impact of positive embedding on the anchor and treats the positive as well as negative sample pairs equally.
We conduct comprehensive experiments and achieve state-of-the-art performance on two benchmark datasets.
arXiv Detail & Related papers (2022-07-04T13:51:29Z) - Self-Damaging Contrastive Learning [92.34124578823977]
Unlabeled data in reality is commonly imbalanced and shows a long-tail distribution.
This paper proposes a principled framework called Self-Damaging Contrastive Learning to automatically balance the representation learning without knowing the classes.
Our experiments show that SDCLR significantly improves not only overall accuracies but also balancedness.
arXiv Detail & Related papers (2021-06-06T00:04:49Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Contrastive Attraction and Contrastive Repulsion for Representation
Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples.
Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet.
We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z) - Conditional Negative Sampling for Contrastive Learning of Visual
Representations [19.136685699971864]
We show that choosing difficult negatives, or those more similar to the current instance, can yield stronger representations.
We introduce a family of mutual information estimators that sample negatives conditionally -- in a "ring" around each positive.
We prove that these estimators lower-bound mutual information, with higher bias but lower variance than NCE.
arXiv Detail & Related papers (2020-10-05T14:17:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.