Making Document-Level Information Extraction Right for the Right Reasons
- URL: http://arxiv.org/abs/2110.07686v1
- Date: Thu, 14 Oct 2021 19:52:47 GMT
- Title: Making Document-Level Information Extraction Right for the Right Reasons
- Authors: Liyan Tang, Dhruv Rajan, Suyash Mohan, Abhijeet Pradhan, R. Nick
Bryan, Greg Durrett
- Abstract summary: Document-level information extraction is a flexible framework compatible with applications where information is not necessarily localized in a single sentence.
This work studies how to ensure that document-level neural models make correct inferences from complex text and make those inferences in an auditable way.
- Score: 19.00249049142611
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Document-level information extraction is a flexible framework compatible with
applications where information is not necessarily localized in a single
sentence. For example, key features of a diagnosis in radiology a report may
not be explicitly stated, but nevertheless can be inferred from the report's
text. However, document-level neural models can easily learn spurious
correlations from irrelevant information. This work studies how to ensure that
these models make correct inferences from complex text and make those
inferences in an auditable way: beyond just being right, are these models
"right for the right reasons?" We experiment with post-hoc evidence extraction
in a predict-select-verify framework using feature attribution techniques.
While this basic approach can extract reasonable evidence, it can be
regularized with small amounts of evidence supervision during training, which
substantially improves the quality of extracted evidence. We evaluate on two
domains: a small-scale labeled dataset of brain MRI reports and a large-scale
modified version of DocRED (Yao et al., 2019) and show that models'
plausibility can be improved with no loss in accuracy.
Related papers
- Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation [20.173287130474797]
generative medical Vision Large Language Models (VLLMs) are prone to hallucinations and can produce inaccurate diagnostic information.
We introduce a novel Semantic Consistency-Based Uncertainty Quantification framework that provides both report-level and sentence-level uncertainties.
By abstaining from high-uncertainty reports, our approach improves factuality scores by $10$%, achieved by rejecting $20$% of reports.
arXiv Detail & Related papers (2024-12-05T20:43:39Z) - Anatomically-Grounded Fact Checking of Automated Chest X-ray Reports [0.0]
We propose a novel model for explainable fact-checking that identifies errors in findings and their locations indicated through the reports.
We evaluate the resulting fact-checking model and its utility in correcting reports generated by several SOTA automated reporting tools.
arXiv Detail & Related papers (2024-12-03T05:21:42Z) - Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - Fact or Fiction? Improving Fact Verification with Knowledge Graphs through Simplified Subgraph Retrievals [0.0]
We present efficient methods for verifying claims on a dataset where the evidence is in the form of structured knowledge graphs.
By simplifying the evidence retrieval process, we are able to construct models that both require less computational resources and achieve better test-set accuracy.
arXiv Detail & Related papers (2024-08-14T10:46:15Z) - GEGA: Graph Convolutional Networks and Evidence Retrieval Guided Attention for Enhanced Document-level Relation Extraction [15.246183329778656]
Document-level relation extraction (DocRE) aims to extract relations between entities from unstructured document text.
To overcome these challenges, we propose GEGA, a novel model for DocRE.
We evaluate the GEGA model on three widely used benchmark datasets: DocRED, Re-DocRED, and Revisit-DocRED.
arXiv Detail & Related papers (2024-07-31T07:15:33Z) - Contrastive Learning with Counterfactual Explanations for Radiology Report Generation [83.30609465252441]
We propose a textbfCountertextbfFactual textbfExplanations-based framework (CoFE) for radiology report generation.
Counterfactual explanations serve as a potent tool for understanding how decisions made by algorithms can be changed by asking what if'' scenarios.
Experiments on two benchmarks demonstrate that leveraging the counterfactual explanations enables CoFE to generate semantically coherent and factually complete reports.
arXiv Detail & Related papers (2024-07-19T17:24:25Z) - Factual Error Correction for Abstractive Summaries Using Entity
Retrieval [57.01193722520597]
We propose an efficient factual error correction system RFEC based on entities retrieval post-editing process.
RFEC retrieves the evidence sentences from the original document by comparing the sentences with the target summary.
Next, RFEC detects the entity-level errors in the summaries by considering the evidence sentences and substitutes the wrong entities with the accurate entities from the evidence sentences.
arXiv Detail & Related papers (2022-04-18T11:35:02Z) - Distantly-Supervised Evidence Retrieval Enables Question Answering
without Evidence Annotation [19.551623461082617]
Open-domain question answering answers a question based on evidence retrieved from a large corpus.
This paper investigates whether models can learn to find evidence from a large corpus, with only distant supervision from answer labels for model training.
We introduce a novel approach (DistDR) that iteratively improves over a weak retriever by alternately finding evidence from the up-to-date model and encouraging the model to learn the most likely evidence.
arXiv Detail & Related papers (2021-10-10T20:01:27Z) - SAIS: Supervising and Augmenting Intermediate Steps for Document-Level
Relation Extraction [51.27558374091491]
We propose to explicitly teach the model to capture relevant contexts and entity types by supervising and augmenting intermediate steps (SAIS) for relation extraction.
Based on a broad spectrum of carefully designed tasks, our proposed SAIS method not only extracts relations of better quality due to more effective supervision, but also retrieves the corresponding supporting evidence more accurately.
arXiv Detail & Related papers (2021-09-24T17:37:35Z) - A Multi-Level Attention Model for Evidence-Based Fact Checking [58.95413968110558]
We present a simple model that can be trained on sequence structures.
Results on a large-scale dataset for Fact Extraction and VERification show that our model outperforms the graph-based approaches.
arXiv Detail & Related papers (2021-06-02T05:40:12Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.