Do More Negative Samples Necessarily Hurt in Contrastive Learning?
- URL: http://arxiv.org/abs/2205.01789v1
- Date: Tue, 3 May 2022 21:29:59 GMT
- Title: Do More Negative Samples Necessarily Hurt in Contrastive Learning?
- Authors: Pranjal Awasthi, Nishanth Dikkala, Pritish Kamath
- Abstract summary: We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation does not degrade with the number of negative samples.
We also give a structural characterization of the optimal representation in our framework.
- Score: 25.234544066205547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent investigations in noise contrastive estimation suggest, both
empirically as well as theoretically, that while having more "negative samples"
in the contrastive loss improves downstream classification performance
initially, beyond a threshold, it hurts downstream performance due to a
"collision-coverage" trade-off. But is such a phenomenon inherent in
contrastive learning? We show in a simple theoretical setting, where positive
pairs are generated by sampling from the underlying latent class (introduced by
Saunshi et al. (ICML 2019)), that the downstream performance of the
representation optimizing the (population) contrastive loss in fact does not
degrade with the number of negative samples. Along the way, we give a
structural characterization of the optimal representation in our framework, for
noise contrastive estimation. We also provide empirical support for our
theoretical results on CIFAR-10 and CIFAR-100 datasets.
Related papers
- Relation-Aware Network with Attention-Based Loss for Few-Shot Knowledge
Graph Completion [9.181270251524866]
Current approaches randomly select one negative sample for each reference entity pair to minimize a margin-based ranking loss.
We propose a novel Relation-Aware Network with Attention-Based Loss framework.
Experiments demonstrate that RANA outperforms the state-of-the-art models on two benchmark datasets.
arXiv Detail & Related papers (2023-06-15T21:41:43Z) - Chaos is a Ladder: A New Theoretical Understanding of Contrastive
Learning via Augmentation Overlap [64.60460828425502]
We propose a new guarantee on the downstream performance of contrastive learning.
Our new theory hinges on the insight that the support of different intra-class samples will become more overlapped under aggressive data augmentations.
We propose an unsupervised model selection metric ARC that aligns well with downstream accuracy.
arXiv Detail & Related papers (2022-03-25T05:36:26Z) - Hard Negative Sampling via Regularized Optimal Transport for Contrastive
Representation Learning [13.474603286270836]
We study the problem of designing hard negative sampling distributions for unsupervised contrastive representation learning.
We propose and analyze a novel min-max framework that seeks a representation which minimizes the maximum (worst-case) generalized contrastive learning loss.
arXiv Detail & Related papers (2021-11-04T21:25:24Z) - Sharp Learning Bounds for Contrastive Unsupervised Representation
Learning [4.017760528208122]
This study establishes a downstream classification loss bound with a tight intercept in the negative sample size.
By regarding the contrastive loss as a downstream loss estimator, our theory improves the existing learning bounds substantially.
We verify that our theory is consistent with experiments on synthetic, vision, and language datasets.
arXiv Detail & Related papers (2021-10-06T04:29:39Z) - Investigating the Role of Negatives in Contrastive Representation
Learning [59.30700308648194]
Noise contrastive learning is a popular technique for unsupervised representation learning.
We focus on disambiguating the role of one of these parameters: the number of negative examples.
We find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives.
arXiv Detail & Related papers (2021-06-18T06:44:16Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Contrastive Attraction and Contrastive Repulsion for Representation
Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples.
Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet.
We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z) - Understanding Hard Negatives in Noise Contrastive Estimation [21.602701327267905]
We develop analytical tools to understand the role of hard negatives.
We derive a general form of the score function that unifies various architectures used in text retrieval.
arXiv Detail & Related papers (2021-04-13T14:42:41Z) - Doubly Contrastive Deep Clustering [135.7001508427597]
We present a novel Doubly Contrastive Deep Clustering (DCDC) framework, which constructs contrastive loss over both sample and class views.
Specifically, for the sample view, we set the class distribution of the original sample and its augmented version as positive sample pairs.
For the class view, we build the positive and negative pairs from the sample distribution of the class.
In this way, two contrastive losses successfully constrain the clustering results of mini-batch samples in both sample and class level.
arXiv Detail & Related papers (2021-03-09T15:15:32Z) - Understanding Negative Sampling in Graph Representation Learning [87.35038268508414]
We show that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance.
We propose Metropolis-Hastings (MCNS) to approximate the positive distribution with self-contrast approximation and accelerate negative sampling by Metropolis-Hastings.
We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation.
arXiv Detail & Related papers (2020-05-20T06:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.