From Bag of Sentences to Document: Distantly Supervised Relation
Extraction via Machine Reading Comprehension
- URL: http://arxiv.org/abs/2012.04334v2
- Date: Wed, 9 Dec 2020 03:05:41 GMT
- Title: From Bag of Sentences to Document: Distantly Supervised Relation
Extraction via Machine Reading Comprehension
- Authors: Lingyong Yan, Xianpei Han, Le Sun, Fangchao Liu and Ning Bian
- Abstract summary: We propose a new DS paradigm--document-based distant supervision.
We show that our method achieves new state-of-the-art DS performance.
- Score: 22.39362905658063
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Distant supervision (DS) is a promising approach for relation extraction but
often suffers from the noisy label problem. Traditional DS methods usually
represent an entity pair as a bag of sentences and denoise labels using
multi-instance learning techniques. The bag-based paradigm, however, fails to
leverage the inter-sentence-level and the entity-level evidence for relation
extraction, and their denoising algorithms are often specialized and
complicated. In this paper, we propose a new DS paradigm--document-based
distant supervision, which models relation extraction as a document-based
machine reading comprehension (MRC) task. By re-organizing all sentences about
an entity as a document and extracting relations via querying the document with
relation-specific questions, the document-based DS paradigm can simultaneously
encode and exploit all sentence-level, inter-sentence-level, and entity-level
evidence. Furthermore, we design a new loss function--DSLoss (distant
supervision loss), which can effectively train MRC models using only
$\langle$document, question, answer$\rangle$ tuples, therefore noisy label
problem can be inherently resolved. Experiments show that our method achieves
new state-of-the-art DS performance.
Related papers
- Localizing Factual Inconsistencies in Attributable Text Generation [91.981439746404]
We introduce QASemConsistency, a new formalism for localizing factual inconsistencies in attributable text generation.
We first demonstrate the effectiveness of the QASemConsistency methodology for human annotation.
We then implement several methods for automatically detecting localized factual inconsistencies.
arXiv Detail & Related papers (2024-10-09T22:53:48Z) - GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction [15.246183329778656]
Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text.
To overcome these challenges, we propose GEGA, a novel model for DocRE.
We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED.
arXiv Detail & Related papers (2024-07-31T07:15:33Z) - SparseCL: Sparse Contrastive Learning for Contradiction Retrieval [87.02936971689817]
Contradiction retrieval refers to identifying and extracting documents that explicitly disagree with or refute the content of a query.
Existing methods such as similarity search and crossencoder models exhibit significant limitations.
We introduce SparseCL that leverages specially trained sentence embeddings designed to preserve subtle, contradictory nuances between sentences.
arXiv Detail & Related papers (2024-06-15T21:57:03Z) - Absformer: Transformer-based Model for Unsupervised Multi-Document
Abstractive Summarization [1.066048003460524]
Multi-document summarization (MDS) refers to the task of summarizing the text in multiple documents into a concise summary.
Abstractive MDS aims to generate a coherent and fluent summary for multiple documents using natural language generation techniques.
We propose Absformer, a new Transformer-based method for unsupervised abstractive summary generation.
arXiv Detail & Related papers (2023-06-07T21:18:23Z) - Document-aware Positional Encoding and Linguistic-guided Encoding for
Abstractive Multi-document Summarization [12.799359904396624]
One key challenge in multi-document summarization is to capture the relations among input documents that distinguish between single document summarization (SDS) and multi-document summarization (MDS)
We propose document-aware positional encoding and linguistic-guided encoding that can be fused with Transformer architecture for MDS.
arXiv Detail & Related papers (2022-09-13T12:22:38Z) - Questions Are All You Need to Train a Dense Passage Retriever [123.13872383489172]
ART is a new corpus-level autoencoding approach for training dense retrieval models that does not require any labeled training data.
It uses a new document-retrieval autoencoding scheme, where (1) an input question is used to retrieve a set of evidence documents, and (2) the documents are then used to compute the probability of reconstructing the original question.
arXiv Detail & Related papers (2022-06-21T18:16:31Z) - Autoregressive Search Engines: Generating Substrings as Document
Identifiers [53.0729058170278]
Autoregressive language models are emerging as the de-facto standard for generating answers.
Previous work has explored ways to partition the search space into hierarchical structures.
In this work we propose an alternative that doesn't force any structure in the search space: using all ngrams in a passage as its possible identifiers.
arXiv Detail & Related papers (2022-04-22T10:45:01Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Denoising Relation Extraction from Document-level Distant Supervision [92.76441007250197]
We propose a novel pre-trained model for DocRE, which denoises the document-level DS data via multiple pre-training tasks.
Experimental results on the large-scale DocRE benchmark show that our model can capture useful information from noisy DS data and achieve promising results.
arXiv Detail & Related papers (2020-11-08T02:05:25Z) - WSL-DS: Weakly Supervised Learning with Distant Supervision for Query
Focused Multi-Document Abstractive Summarization [16.048329028104643]
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents.
One major challenge for this task is the lack of availability of labeled training datasets.
We propose a novel weakly supervised learning approach via utilizing distant supervision.
arXiv Detail & Related papers (2020-11-03T02:02:55Z) - Robust Document Representations using Latent Topics and Metadata [17.306088038339336]
We propose a novel approach to fine-tuning a pre-trained neural language model for document classification problems.
We generate document representations that capture both text and metadata artifacts in a task manner.
Our solution also incorporates metadata explicitly rather than just augmenting them with text.
arXiv Detail & Related papers (2020-10-23T21:52:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.