Better Sampling of Negatives for Distantly Supervised Named Entity
Recognition
- URL: http://arxiv.org/abs/2305.13142v1
- Date: Mon, 22 May 2023 15:35:39 GMT
- Title: Better Sampling of Negatives for Distantly Supervised Named Entity
Recognition
- Authors: Lu Xu, Lidong Bing, Wei Lu
- Abstract summary: We propose a simple and straightforward approach for selecting the top negative samples that have high similarities with all the positive samples for training.
Our method achieves consistent performance improvements on four distantly supervised NER datasets.
- Score: 39.264878763160766
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Distantly supervised named entity recognition (DS-NER) has been proposed to
exploit the automatically labeled training data instead of human annotations.
The distantly annotated datasets are often noisy and contain a considerable
number of false negatives. The recent approach uses a weighted sampling
approach to select a subset of negative samples for training. However, it
requires a good classifier to assign weights to the negative samples. In this
paper, we propose a simple and straightforward approach for selecting the top
negative samples that have high similarities with all the positive samples for
training. Our method achieves consistent performance improvements on four
distantly supervised NER datasets. Our analysis also shows that it is critical
to differentiate the true negatives from the false negatives.
Related papers
- Your Negative May not Be True Negative: Boosting Image-Text Matching
with False Negative Elimination [62.18768931714238]
We propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling.
The results demonstrate the superiority of our proposed false negative elimination strategy.
arXiv Detail & Related papers (2023-08-08T16:31:43Z) - Robust Positive-Unlabeled Learning via Noise Negative Sample
Self-correction [48.929877651182885]
Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature.
We propose a new robust PU learning method with a training strategy motivated by the nature of human learning.
arXiv Detail & Related papers (2023-08-01T04:34:52Z) - SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval [126.22182758461244]
We show that according to the measured relevance scores, the negatives ranked around the positives are generally more informative and less likely to be false negatives.
We propose a simple ambiguous negatives sampling method, SimANS, which incorporates a new sampling probability distribution to sample more ambiguous negatives.
arXiv Detail & Related papers (2022-10-21T07:18:05Z) - Generating Negative Samples for Sequential Recommendation [83.60655196391855]
We propose to Generate Negative Samples (items) for Sequential Recommendation (SR)
A negative item is sampled at each time step based on the current SR model's learned user preferences toward items.
Experiments on four public datasets verify the importance of providing high-quality negative samples for SR.
arXiv Detail & Related papers (2022-08-07T05:44:13Z) - Rethinking Negative Sampling for Unlabeled Entity Problem in Named
Entity Recognition [47.273602658066196]
Unlabeled entities seriously degrade the performances of named entity recognition models.
We analyze why negative sampling succeeds both theoretically and empirically.
We propose a weighted and adaptive sampling distribution for negative sampling.
arXiv Detail & Related papers (2021-08-26T07:02:57Z) - Rethinking InfoNCE: How Many Negative Samples Do You Need? [54.146208195806636]
We study how many negative samples are optimal for InfoNCE in different scenarios via a semi-quantitative theoretical framework.
We estimate the optimal negative sampling ratio using the $K$ value that maximizes the training effectiveness function.
arXiv Detail & Related papers (2021-05-27T08:38:29Z) - Simplify and Robustify Negative Sampling for Implicit Collaborative
Filtering [42.832851785261894]
In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning.
We then tackle the untouched false negative problem by favouring high-variance samples stored in memory.
Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method.
arXiv Detail & Related papers (2020-09-07T19:08:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.