Scientific Claim Verification with VERT5ERINI
- URL: http://arxiv.org/abs/2010.11930v1
- Date: Thu, 22 Oct 2020 17:56:33 GMT
- Title: Scientific Claim Verification with VERT5ERINI
- Authors: Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin
- Abstract summary: This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain.
We propose VERT5ERINI that exploits T5 for abstract retrieval, sentence selection and label prediction.
We evaluate our pipeline on SCIFACT, a newly curated dataset that requires models to not just predict the veracity of claims but also provide relevant sentences from a corpus of scientific literature that support this decision.
- Score: 57.103189505636614
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work describes the adaptation of a pretrained sequence-to-sequence model
to the task of scientific claim verification in the biomedical domain. We
propose VERT5ERINI that exploits T5 for abstract retrieval, sentence selection
and label prediction, which are three critical sub-tasks of claim verification.
We evaluate our pipeline on SCIFACT, a newly curated dataset that requires
models to not just predict the veracity of claims but also provide relevant
sentences from a corpus of scientific literature that support this decision.
Empirically, our pipeline outperforms a strong baseline in each of the three
steps. Finally, we show VERT5ERINI's ability to generalize to two new datasets
of COVID-19 claims using evidence from the ever-expanding CORD-19 corpus.
Related papers
- Contrastive Learning to Improve Retrieval for Real-world Fact Checking [84.57583869042791]
We present Contrastive Fact-Checking Reranker (CFR), an improved retriever for fact-checking complex claims.
We leverage the AVeriTeC dataset, which annotates subquestions for claims with human written answers from evidence documents.
We find a 6% improvement in veracity classification accuracy on the dataset.
arXiv Detail & Related papers (2024-10-07T00:09:50Z) - WiCE: Real-World Entailment for Claims in Wikipedia [63.234352061821625]
We propose WiCE, a new fine-grained textual entailment dataset built on natural claim and evidence pairs extracted from Wikipedia.
In addition to standard claim-level entailment, WiCE provides entailment judgments over sub-sentence units of the claim.
We show that real claims in our dataset involve challenging verification and retrieval problems that existing models fail to address.
arXiv Detail & Related papers (2023-03-02T17:45:32Z) - BLIAM: Literature-based Data Synthesis for Synergistic Drug Combination
Prediction [13.361489059744754]
BLIAM generates training data points that are interpretable and model-agnostic to downstream applications.
BLIAM can be further used to synthesize data points for novel drugs and cell lines that were not even measured in biomedical experiments.
arXiv Detail & Related papers (2023-02-14T06:48:52Z) - GERE: Generative Evidence Retrieval for Fact Verification [57.78768817972026]
We propose GERE, the first system that retrieves evidences in a generative fashion.
The experimental results on the FEVER dataset show that GERE achieves significant improvements over the state-of-the-art baselines.
arXiv Detail & Related papers (2022-04-12T03:49:35Z) - Abstract, Rationale, Stance: A Joint Model for Scientific Claim
Verification [18.330265729989843]
We propose an approach, named as ARSJoint, that jointly learns the modules for the three tasks with a machine reading comprehension framework.
The experimental results on the benchmark dataset SciFact show that our approach outperforms the existing works.
arXiv Detail & Related papers (2021-09-13T10:07:26Z) - An Interpretable End-to-end Fine-tuning Approach for Long Clinical Text [72.62848911347466]
Unstructured clinical text in EHRs contains crucial information for applications including decision support, trial matching, and retrospective research.
Recent work has applied BERT-based models to clinical information extraction and text classification, given these models' state-of-the-art performance in other NLP domains.
In this work, we propose a novel fine-tuning approach called SnipBERT. Instead of using entire notes, SnipBERT identifies crucial snippets and feeds them into a truncated BERT-based model in a hierarchical manner.
arXiv Detail & Related papers (2020-11-12T17:14:32Z) - Text Mining to Identify and Extract Novel Disease Treatments From
Unstructured Datasets [56.38623317907416]
We use Google Cloud to transcribe podcast episodes of an NPR radio show.
We then build a pipeline for systematically pre-processing the text.
Our model successfully identified that Omeprazole can help treat heartburn.
arXiv Detail & Related papers (2020-10-22T19:52:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.