Simplify and Robustify Negative Sampling for Implicit Collaborative
Filtering
- URL: http://arxiv.org/abs/2009.03376v1
- Date: Mon, 7 Sep 2020 19:08:26 GMT
- Title: Simplify and Robustify Negative Sampling for Implicit Collaborative
Filtering
- Authors: Jingtao Ding, Yuhan Quan, Quanming Yao, Yong Li, Depeng Jin
- Abstract summary: In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning.
We then tackle the untouched false negative problem by favouring high-variance samples stored in memory.
Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method.
- Score: 42.832851785261894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Negative sampling approaches are prevalent in implicit collaborative
filtering for obtaining negative labels from massive unlabeled data. As two
major concerns in negative sampling, efficiency and effectiveness are still not
fully achieved by recent works that use complicate structures and overlook risk
of false negative instances. In this paper, we first provide a novel
understanding of negative instances by empirically observing that only a few
instances are potentially important for model learning, and false negatives
tend to have stable predictions over many training iterations. Above findings
motivate us to simplify the model by sampling from designed memory that only
stores a few important candidates and, more importantly, tackle the untouched
false negative problem by favouring high-variance samples stored in memory,
which achieves efficient sampling of true negatives with high-quality.
Empirical results on two synthetic datasets and three real-world datasets
demonstrate both robustness and superiorities of our negative sampling method.
Related papers
- Your Negative May not Be True Negative: Boosting Image-Text Matching
with False Negative Elimination [62.18768931714238]
We propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling.
The results demonstrate the superiority of our proposed false negative elimination strategy.
arXiv Detail & Related papers (2023-08-08T16:31:43Z) - Better Sampling of Negatives for Distantly Supervised Named Entity
Recognition [39.264878763160766]
We propose a simple and straightforward approach for selecting the top negative samples that have high similarities with all the positive samples for training.
Our method achieves consistent performance improvements on four distantly supervised NER datasets.
arXiv Detail & Related papers (2023-05-22T15:35:39Z) - Rethinking Negative Sampling for Unlabeled Entity Problem in Named
Entity Recognition [47.273602658066196]
Unlabeled entities seriously degrade the performances of named entity recognition models.
We analyze why negative sampling succeeds both theoretically and empirically.
We propose a weighted and adaptive sampling distribution for negative sampling.
arXiv Detail & Related papers (2021-08-26T07:02:57Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Revisiting the Negative Data of Distantly Supervised Relation Extraction [17.00557139562208]
Distantly supervision automatically generates plenty of training samples for relation extraction.
It also incurs two major problems: noisy labels and imbalanced training data.
We propose a pipeline approach, dubbed textscReRe, that performs sentence-level relation detection then subject/object extraction.
arXiv Detail & Related papers (2021-05-21T06:44:19Z) - Contrastive Attraction and Contrastive Repulsion for Representation
Learning [131.72147978462348]
Contrastive learning (CL) methods learn data representations in a self-supervision manner, where the encoder contrasts each positive sample over multiple negative samples.
Recent CL methods have achieved promising results when pretrained on large-scale datasets, such as ImageNet.
We propose a doubly CL strategy that separately compares positive and negative samples within their own groups, and then proceeds with a contrast between positive and negative groups.
arXiv Detail & Related papers (2021-05-08T17:25:08Z) - Doubly Contrastive Deep Clustering [135.7001508427597]
We present a novel Doubly Contrastive Deep Clustering (DCDC) framework, which constructs contrastive loss over both sample and class views.
Specifically, for the sample view, we set the class distribution of the original sample and its augmented version as positive sample pairs.
For the class view, we build the positive and negative pairs from the sample distribution of the class.
In this way, two contrastive losses successfully constrain the clustering results of mini-batch samples in both sample and class level.
arXiv Detail & Related papers (2021-03-09T15:15:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.