Understanding Hard Negatives in Noise Contrastive Estimation
- URL: http://arxiv.org/abs/2104.06245v1
- Date: Tue, 13 Apr 2021 14:42:41 GMT
- Title: Understanding Hard Negatives in Noise Contrastive Estimation
- Authors: Wenzheng Zhang and Karl Stratos
- Abstract summary: We develop analytical tools to understand the role of hard negatives.
We derive a general form of the score function that unifies various architectures used in text retrieval.
- Score: 21.602701327267905
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The choice of negative examples is important in noise contrastive estimation.
Recent works find that hard negatives -- highest-scoring incorrect examples
under the model -- are effective in practice, but they are used without a
formal justification. We develop analytical tools to understand the role of
hard negatives. Specifically, we view the contrastive loss as a biased
estimator of the gradient of the cross-entropy loss, and show both
theoretically and empirically that setting the negative distribution to be the
model distribution results in bias reduction. We also derive a general form of
the score function that unifies various architectures used in text retrieval.
By combining hard negatives with appropriate score functions, we obtain strong
results on the challenging task of zero-shot entity linking.
Related papers
- Contrastive Learning with Negative Sampling Correction [52.990001829393506]
We propose a novel contrastive learning method named Positive-Unlabeled Contrastive Learning (PUCL)
PUCL treats the generated negative samples as unlabeled samples and uses information from positive samples to correct bias in contrastive loss.
PUCL can be applied to general contrastive learning problems and outperforms state-of-the-art methods on various image and graph classification tasks.
arXiv Detail & Related papers (2024-01-13T11:18:18Z) - Your Negative May not Be True Negative: Boosting Image-Text Matching
with False Negative Elimination [62.18768931714238]
We propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling.
The results demonstrate the superiority of our proposed false negative elimination strategy.
arXiv Detail & Related papers (2023-08-08T16:31:43Z) - GAPX: Generalized Autoregressive Paraphrase-Identification X [24.331570697458954]
A major source of this performance drop comes from biases introduced by negative examples.
We introduce a perplexity based out-of-distribution metric that we show can effectively and automatically determine how much weight it should be given during inference.
arXiv Detail & Related papers (2022-10-05T01:23:52Z) - Do More Negative Samples Necessarily Hurt in Contrastive Learning? [25.234544066205547]
We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class, that the downstream performance of the representation does not degrade with the number of negative samples.
We also give a structural characterization of the optimal representation in our framework.
arXiv Detail & Related papers (2022-05-03T21:29:59Z) - Hard Negative Sampling via Regularized Optimal Transport for Contrastive
Representation Learning [13.474603286270836]
We study the problem of designing hard negative sampling distributions for unsupervised contrastive representation learning.
We propose and analyze a novel min-max framework that seeks a representation which minimizes the maximum (worst-case) generalized contrastive learning loss.
arXiv Detail & Related papers (2021-11-04T21:25:24Z) - Investigating the Role of Negatives in Contrastive Representation
Learning [59.30700308648194]
Noise contrastive learning is a popular technique for unsupervised representation learning.
We focus on disambiguating the role of one of these parameters: the number of negative examples.
We find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives.
arXiv Detail & Related papers (2021-06-18T06:44:16Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Deconfounding Scores: Feature Representations for Causal Effect
Estimation with Weak Overlap [140.98628848491146]
We introduce deconfounding scores, which induce better overlap without biasing the target of estimation.
We show that deconfounding scores satisfy a zero-covariance condition that is identifiable in observed data.
In particular, we show that this technique could be an attractive alternative to standard regularizations.
arXiv Detail & Related papers (2021-04-12T18:50:11Z) - Understanding Negative Sampling in Graph Representation Learning [87.35038268508414]
We show that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance.
We propose Metropolis-Hastings (MCNS) to approximate the positive distribution with self-contrast approximation and accelerate negative sampling by Metropolis-Hastings.
We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation.
arXiv Detail & Related papers (2020-05-20T06:25:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.