Rethinking InfoNCE: How Many Negative Samples Do You Need?
- URL: http://arxiv.org/abs/2105.13003v1
- Date: Thu, 27 May 2021 08:38:29 GMT
- Title: Rethinking InfoNCE: How Many Negative Samples Do You Need?
- Authors: Chuhan Wu, Fangzhao Wu, Yongfeng Huang
- Abstract summary: We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
- Score: 54.146208195806636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: InfoNCE loss is a widely used loss function for contrastive model training.
It aims to estimate the mutual information between a pair of variables by
discriminating between each positive pair and its associated $K$ negative
pairs. It is proved that when the sample labels are clean, the lower bound of
mutual information estimation is tighter when more negative samples are
incorporated, which usually yields better model performance. However, in many
real-world tasks the labels often contain noise, and incorporating too many
noisy negative samples for model training may be suboptimal. In this paper, we
study how many negative samples are optimal for InfoNCE in different scenarios
via a semi-quantitative theoretical framework. More specifically, we first
propose a probabilistic model to analyze the influence of the negative sampling
ratio $K$ on training sample informativeness. Then, we design a training
effectiveness function to measure the overall influence of training samples on
model learning based on their informativeness. We estimate the optimal negative
sampling ratio using the $K$ value that maximizes the training effectiveness
function. Based on our framework, we further propose an adaptive negative
sampling method that can dynamically adjust the negative sampling ratio to
improve InfoNCE based model training. Extensive experiments on different
real-world datasets show our framework can accurately predict the optimal
negative sampling ratio in different tasks, and our proposed adaptive negative
sampling method can achieve better performance than the commonly used fixed
negative sampling ratio strategy.
Related papers
- From Overfitting to Robustness: Quantity, Quality, and Variety Oriented Negative Sample Selection in Graph Contrastive Learning [38.87932592059369]
Graph contrastive learning (GCL) aims to contrast positive-negative counterparts to learn the node embeddings.
The variation, quantity, and quality of negative samples compared to positive samples play crucial roles in learning meaningful embeddings for node classification downstream tasks.
This study proposes a novel Cumulative Sample Selection (CSS) algorithm by comprehensively considering negative samples' quality, variations, and quantity.
arXiv Detail & Related papers (2024-06-21T10:47:26Z) - Data Pruning via Moving-one-Sample-out [61.45441981346064]
We propose a novel data-pruning approach called moving-one-sample-out (MoSo)
MoSo aims to identify and remove the least informative samples from the training set.
Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios.
arXiv Detail & Related papers (2023-10-23T08:00:03Z) - Generating Negative Samples for Sequential Recommendation [83.60655196391855]
We propose to Generate Negative Samples (items) for Sequential Recommendation (SR)
A negative item is sampled at each time step based on the current SR model's learned user preferences toward items.
Experiments on four public datasets verify the importance of providing high-quality negative samples for SR.
arXiv Detail & Related papers (2022-08-07T05:44:13Z) - Hard Negative Sampling Strategies for Contrastive Representation
Learning [4.1531215150301035]
UnReMix is a hard negative sampling strategy that takes into account anchor similarity, model uncertainty and representativeness.
Experimental results on several benchmarks show that UnReMix improves negative sample selection, and subsequently downstream performance when compared to state-of-the-art contrastive learning methods.
arXiv Detail & Related papers (2022-06-02T17:55:15Z) - Exploring the Impact of Negative Samples of Contrastive Learning: A Case
Study of Sentence Embedding [14.295787044482136]
We present a momentum contrastive learning model with negative sample queue for sentence embedding, namely MoCoSE.
We define a maximum traceable distance metric, through which we learn to what extent the text contrastive learning benefits from the historical information of negative samples.
Our experiments find that the best results are obtained when the maximum traceable distance is at a certain range, demonstrating that there is an optimal range of historical information for a negative sample queue.
arXiv Detail & Related papers (2022-02-26T08:29:25Z) - Doubly Contrastive Deep Clustering [135.7001508427597]
We present a novel Doubly Contrastive Deep Clustering (DCDC) framework, which constructs contrastive loss over both sample and class views.
Specifically, for the sample view, we set the class distribution of the original sample and its augmented version as positive sample pairs.
For the class view, we build the positive and negative pairs from the sample distribution of the class.
In this way, two contrastive losses successfully constrain the clustering results of mini-batch samples in both sample and class level.
arXiv Detail & Related papers (2021-03-09T15:15:32Z) - Conditional Negative Sampling for Contrastive Learning of Visual
Representations [19.136685699971864]
We show that choosing difficult negatives, or those more similar to the current instance, can yield stronger representations.
We introduce a family of mutual information estimators that sample negatives conditionally -- in a "ring" around each positive.
We prove that these estimators lower-bound mutual information, with higher bias but lower variance than NCE.
arXiv Detail & Related papers (2020-10-05T14:17:32Z) - Simplify and Robustify Negative Sampling for Implicit Collaborative
Filtering [42.832851785261894]
In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning.
We then tackle the untouched false negative problem by favouring high-variance samples stored in memory.
Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method.
arXiv Detail & Related papers (2020-09-07T19:08:26Z) - Understanding Negative Sampling in Graph Representation Learning [87.35038268508414]
We show that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance.
We propose Metropolis-Hastings (MCNS) to approximate the positive distribution with self-contrast approximation and accelerate negative sampling by Metropolis-Hastings.
We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation.
arXiv Detail & Related papers (2020-05-20T06:25:21Z) - Reinforced Negative Sampling over Knowledge Graph for Recommendation [106.07209348727564]
We develop a new negative sampling model, Knowledge Graph Policy Network (kgPolicy), which works as a reinforcement learning agent to explore high-quality negatives.
kgPolicy navigates from the target positive interaction, adaptively receives knowledge-aware negative signals, and ultimately yields a potential negative item to train the recommender.
arXiv Detail & Related papers (2020-03-12T12:44:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.