Denoising Relation Extraction from Document-level Distant Supervision
- URL: http://arxiv.org/abs/2011.03888v1
- Date: Sun, 8 Nov 2020 02:05:25 GMT
- Title: Denoising Relation Extraction from Document-level Distant Supervision
- Authors: Chaojun Xiao, Yuan Yao, Ruobing Xie, Xu Han, Zhiyuan Liu, Maosong Sun,
Fen Lin, Leyu Lin
- Abstract summary: We propose a novel pre-trained model for DocRE, which denoises the document-level DS data via multiple pre-training tasks.
Experimental results on the large-scale DocRE benchmark show that our model can capture useful information from noisy DS data and achieve promising results.
- Score: 92.76441007250197
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distant supervision (DS) has been widely used to generate auto-labeled data
for sentence-level relation extraction (RE), which improves RE performance.
However, the existing success of DS cannot be directly transferred to the more
challenging document-level relation extraction (DocRE), since the inherent
noise in DS may be even multiplied in document level and significantly harm the
performance of RE. To address this challenge, we propose a novel pre-trained
model for DocRE, which denoises the document-level DS data via multiple
pre-training tasks. Experimental results on the large-scale DocRE benchmark
show that our model can capture useful information from noisy DS data and
achieve promising results.
Related papers
- GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction [15.246183329778656]
Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text.
To overcome these challenges, we propose GEGA, a novel model for DocRE.
We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED.
arXiv Detail & Related papers (2024-07-31T07:15:33Z) - InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales [14.655518998487237]
We propose InstructRAG, where LMs explicitly learn the denoising process through self-synthesized rationales.
InstructRAG requires no additional supervision, allows for easier verification of the predicted answers.
Experiments show InstructRAG consistently outperforms existing RAG methods in both training-free and trainable scenarios.
arXiv Detail & Related papers (2024-06-19T15:25:29Z) - TTM-RE: Memory-Augmented Document-Level Relation Extraction [30.142461633461394]
We propose TTM-RE, a novel approach that integrates a trainable memory module, known as the Token Turing Machine, with a noisy-robust loss function.
Experiments on ReDocRED, a benchmark dataset for document-level relation extraction, reveal that TTM-RE achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-06-09T20:18:58Z) - Temporal Output Discrepancy for Loss Estimation-based Active Learning [65.93767110342502]
We present a novel deep active learning approach that queries the oracle for data annotation when the unlabeled sample is believed to incorporate high loss.
Our approach achieves superior performances than the state-of-the-art active learning methods on image classification and semantic segmentation tasks.
arXiv Detail & Related papers (2022-12-20T19:29:37Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Improving Long Tailed Document-Level Relation Extraction via Easy
Relation Augmentation and Contrastive Learning [66.83982926437547]
We argue that mitigating the long-tailed distribution problem is crucial for DocRE in the real-world scenario.
Motivated by the long-tailed distribution problem, we propose an Easy Relation Augmentation(ERA) method for improving DocRE.
arXiv Detail & Related papers (2022-05-21T06:15:11Z) - Augmenting Document Representations for Dense Retrieval with
Interpolation and Perturbation [49.940525611640346]
Document Augmentation for dense Retrieval (DAR) framework augments the representations of documents with their Dense Augmentation and perturbations.
We validate the performance of DAR on retrieval tasks with two benchmark datasets, showing that the proposed DAR significantly outperforms relevant baselines on the dense retrieval of both the labeled and unlabeled documents.
arXiv Detail & Related papers (2022-03-15T09:07:38Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - From Bag of Sentences to Document: Distantly Supervised Relation
Extraction via Machine Reading Comprehension [22.39362905658063]
We propose a new DS paradigm--document-based distant supervision.
We show that our method achieves new state-of-the-art DS performance.
arXiv Detail & Related papers (2020-12-08T10:16:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.