Revisiting the Negative Data of Distantly Supervised Relation Extraction
- URL: http://arxiv.org/abs/2105.10158v1
- Date: Fri, 21 May 2021 06:44:19 GMT
- Title: Revisiting the Negative Data of Distantly Supervised Relation Extraction
- Authors: Chenhao Xie, Jiaqing Liang, Jingping Liu, Chengsong Huang, Wenhao
Huang, Yanghua Xiao
- Abstract summary: Distantly supervision automatically generates plenty of training samples for relation extraction.
It also incurs two major problems: noisy labels and imbalanced training data.
We propose a pipeline approach, dubbed textscReRe, that performs sentence-level relation detection then subject/object extraction.
- Score: 17.00557139562208
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Distantly supervision automatically generates plenty of training samples for
relation extraction. However, it also incurs two major problems: noisy labels
and imbalanced training data. Previous works focus more on reducing wrongly
labeled relations (false positives) while few explore the missing relations
that are caused by incompleteness of knowledge base (false negatives).
Furthermore, the quantity of negative labels overwhelmingly surpasses the
positive ones in previous problem formulations. In this paper, we first provide
a thorough analysis of the above challenges caused by negative data. Next, we
formulate the problem of relation extraction into as a positive unlabeled
learning task to alleviate false negative problem. Thirdly, we propose a
pipeline approach, dubbed \textsc{ReRe}, that performs sentence-level relation
detection then subject/object extraction to achieve sample-efficient training.
Experimental results show that the proposed method consistently outperforms
existing approaches and remains excellent performance even learned with a large
quantity of false positive samples.
Related papers
- Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Your Negative May not Be True Negative: Boosting Image-Text Matching
with False Negative Elimination [62.18768931714238]
We propose a novel False Negative Elimination (FNE) strategy to select negatives via sampling.
The results demonstrate the superiority of our proposed false negative elimination strategy.
arXiv Detail & Related papers (2023-08-08T16:31:43Z) - Robust Positive-Unlabeled Learning via Noise Negative Sample
Self-correction [48.929877651182885]
Learning from positive and unlabeled data is known as positive-unlabeled (PU) learning in literature.
We propose a new robust PU learning method with a training strategy motivated by the nature of human learning.
arXiv Detail & Related papers (2023-08-01T04:34:52Z) - Better Sampling of Negatives for Distantly Supervised Named Entity
Recognition [39.264878763160766]
We propose a simple and straightforward approach for selecting the top negative samples that have high similarities with all the positive samples for training.
Our method achieves consistent performance improvements on four distantly supervised NER datasets.
arXiv Detail & Related papers (2023-05-22T15:35:39Z) - Rethinking Negative Sampling for Unlabeled Entity Problem in Named
Entity Recognition [47.273602658066196]
Unlabeled entities seriously degrade the performances of named entity recognition models.
We analyze why negative sampling succeeds both theoretically and empirically.
We propose a weighted and adaptive sampling distribution for negative sampling.
arXiv Detail & Related papers (2021-08-26T07:02:57Z) - Simplify and Robustify Negative Sampling for Implicit Collaborative
Filtering [42.832851785261894]
In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning.
We then tackle the untouched false negative problem by favouring high-variance samples stored in memory.
Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method.
arXiv Detail & Related papers (2020-09-07T19:08:26Z) - NPCFace: Negative-Positive Collaborative Training for Large-scale Face
Recognition [78.21084529159577]
We study how to make better use of hard samples for improving the training.
The correlation between hard positive and hard negative is overlooked, and so is the relation between the margins in positive and negative logits.
We propose a novel Negative-Positive Collaboration loss, named NPCFace, which emphasizes the training on both negative and positive hard cases.
arXiv Detail & Related papers (2020-07-20T14:52:29Z) - Understanding Negative Sampling in Graph Representation Learning [87.35038268508414]
We show that negative sampling is as important as positive sampling in determining the optimization objective and the resulted variance.
We propose Metropolis-Hastings (MCNS) to approximate the positive distribution with self-contrast approximation and accelerate negative sampling by Metropolis-Hastings.
We evaluate our method on 5 datasets that cover extensive downstream graph learning tasks, including link prediction, node classification and personalized recommendation.
arXiv Detail & Related papers (2020-05-20T06:25:21Z) - MixPUL: Consistency-based Augmentation for Positive and Unlabeled
Learning [8.7382177147041]
We propose a simple yet effective data augmentation method, coinedalgo, based on emphconsistency regularization.
algoincorporates supervised and unsupervised consistency training to generate augmented data.
We show thatalgoachieves an averaged improvement of classification error from 16.49 to 13.09 on the CIFAR-10 dataset across different positive data amount.
arXiv Detail & Related papers (2020-04-20T15:43:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.