Improving Distantly Supervised Relation Extraction with Self-Ensemble
Noise Filtering
- URL: http://arxiv.org/abs/2108.09689v1
- Date: Sun, 22 Aug 2021 11:23:36 GMT
- Title: Improving Distantly Supervised Relation Extraction with Self-Ensemble
Noise Filtering
- Authors: Tapas Nayak and Navonil Majumder and Soujanya Poria
- Abstract summary: We propose a self-ensemble filtering mechanism to filter out noisy samples during the training process.
Our experiments with multiple state-of-the-art relation extraction models show that our proposed filtering mechanism improves the robustness of the models and increases their F1 scores.
- Score: 17.45521023572853
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Distantly supervised models are very popular for relation extraction since we
can obtain a large amount of training data using the distant supervision method
without human annotation. In distant supervision, a sentence is considered as a
source of a tuple if the sentence contains both entities of the tuple. However,
this condition is too permissive and does not guarantee the presence of
relevant relation-specific information in the sentence. As such, distantly
supervised training data contains much noise which adversely affects the
performance of the models. In this paper, we propose a self-ensemble filtering
mechanism to filter out the noisy samples during the training process. We
evaluate our proposed framework on the New York Times dataset which is obtained
via distant supervision. Our experiments with multiple state-of-the-art neural
relation extraction models show that our proposed filtering mechanism improves
the robustness of the models and increases their F1 scores.
Related papers
- Weak Reward Model Transforms Generative Models into Robust Causal Event Extraction Systems [17.10762463903638]
We train evaluation models to approximate human evaluation, achieving high agreement.
We propose a weak-to-strong supervision method that uses a fraction of the annotated data to train an evaluation model.
arXiv Detail & Related papers (2024-06-26T10:48:14Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Noisy Self-Training with Synthetic Queries for Dense Retrieval [49.49928764695172]
We introduce a novel noisy self-training framework combined with synthetic queries.
Experimental results show that our method improves consistently over existing methods.
Our method is data efficient and outperforms competitive baselines.
arXiv Detail & Related papers (2023-11-27T06:19:50Z) - Class-Adaptive Self-Training for Relation Extraction with Incompletely
Annotated Training Data [43.46328487543664]
Relation extraction (RE) aims to extract relations from sentences and documents.
Recent studies showed that many RE datasets are incompletely annotated.
This is known as the false negative problem in which valid relations are falsely annotated as 'no_relation'
arXiv Detail & Related papers (2023-06-16T09:01:45Z) - Knockoffs-SPR: Clean Sample Selection in Learning with Noisy Labels [56.81761908354718]
We propose a novel theoretically guaranteed clean sample selection framework for learning with noisy labels.
Knockoffs-SPR can be regarded as a sample selection module for a standard supervised training pipeline.
We further combine it with a semi-supervised algorithm to exploit the support of noisy data as unlabeled data.
arXiv Detail & Related papers (2023-01-02T07:13:28Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - Few Clean Instances Help Denoising Distant Supervision [28.336399223985175]
We study whether a small clean dataset could help improve the quality of distantly supervised models.
We show that besides getting a more convincing evaluation of models, a small clean dataset also helps us to build more robust denoising models.
arXiv Detail & Related papers (2022-09-14T12:29:57Z) - Unsupervised Extractive Summarization by Pre-training Hierarchical
Transformers [107.12125265675483]
Unsupervised extractive document summarization aims to select important sentences from a document without using labeled summaries during training.
Existing methods are mostly graph-based with sentences as nodes and edge weights measured by sentence similarities.
We find that transformer attentions can be used to rank sentences for unsupervised extractive summarization.
arXiv Detail & Related papers (2020-10-16T08:44:09Z) - Self-Supervised Contrastive Learning for Unsupervised Phoneme
Segmentation [37.054709598792165]
The model is a convolutional neural network that operates directly on the raw waveform.
It is optimized to identify spectral changes in the signal using the Noise-Contrastive Estimation principle.
At test time, a peak detection algorithm is applied over the model outputs to produce the final boundaries.
arXiv Detail & Related papers (2020-07-27T12:10:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.