Science Checker: Extractive-Boolean Question Answering For Scientific
Fact Checking
- URL: http://arxiv.org/abs/2204.12263v1
- Date: Tue, 26 Apr 2022 12:35:23 GMT
- Title: Science Checker: Extractive-Boolean Question Answering For Scientific
Fact Checking
- Authors: Lo\"ic Rakotoson, Charles Letaillieur, Sylvain Massip, Fr\'ejus Laleye
- Abstract summary: We propose a multi-task approach for verifying the scientific questions based on a joint reasoning from facts and evidence in research articles.
With our light and fast proposed architecture, we achieved an average error rate of 4% and a F1-score of 95.6%.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the explosive growth of scientific publications, making the synthesis of
scientific knowledge and fact checking becomes an increasingly complex task. In
this paper, we propose a multi-task approach for verifying the scientific
questions based on a joint reasoning from facts and evidence in research
articles. We propose an intelligent combination of (1) an automatic information
summarization and (2) a Boolean Question Answering which allows to generate an
answer to a scientific question from only extracts obtained after
summarization. Thus on a given topic, our proposed approach conducts structured
content modeling based on paper abstracts to answer a scientific question while
highlighting texts from paper that discuss the topic. We based our final system
on an end-to-end Extractive Question Answering (EQA) combined with a three
outputs classification model to perform in-depth semantic understanding of a
question to illustrate the aggregation of multiple responses. With our light
and fast proposed architecture, we achieved an average error rate of 4% and a
F1-score of 95.6%. Our results are supported via experiments with two QA models
(BERT, RoBERTa) over 3 Million Open Access (OA) articles in the medical and
health domains on Europe PMC.
Related papers
- SciQAG: A Framework for Auto-Generated Science Question Answering Dataset with Fine-grained Evaluation [11.129800893611646]
SciQAG is a framework for automatically generating high-quality science question-answer pairs from a large corpus of scientific literature based on large language models (LLMs)
We construct a large-scale, high-quality, open-ended science QA dataset containing 188,042 QA pairs extracted from 22,743 scientific papers across 24 scientific domains.
We also introduce SciQAG-24D, a new benchmark task designed to evaluate the science question-answering ability of LLMs.
arXiv Detail & Related papers (2024-05-16T09:42:37Z) - Verif.ai: Towards an Open-Source Scientific Generative
Question-Answering System with Referenced and Verifiable Answers [0.0]
We present the current progress of the project Verifai, an open-source scientific generative question-answering system with referenced and verified answers.
The components of the system are (1) an information retrieval system combining semantic and lexical search techniques over scientific papers (Mistral 7B) taking top answers and generating answers with references to the papers from which the claim was derived, and (3) a verification engine that cross-checks the generated claim and the abstract or paper from which the claim was derived.
arXiv Detail & Related papers (2024-02-09T10:25:01Z) - PaperQA: Retrieval-Augmented Generative Agent for Scientific Research [41.9628176602676]
We present PaperQA, a RAG agent for answering questions over the scientific literature.
PaperQA is an agent that performs information retrieval across full-text scientific articles, assesses the relevance of sources and passages, and uses RAG to provide answers.
We also introduce LitQA, a more complex benchmark that requires retrieval and synthesis of information from full-text scientific papers across the literature.
arXiv Detail & Related papers (2023-12-08T18:50:20Z) - Generating Explanations in Medical Question-Answering by Expectation
Maximization Inference over Evidence [33.018873142559286]
We propose a novel approach for generating natural language explanations for answers predicted by medical QA systems.
Our system extract knowledge from medical textbooks to enhance the quality of explanations during the explanation generation process.
arXiv Detail & Related papers (2023-10-02T16:00:37Z) - Reasoning over Hierarchical Question Decomposition Tree for Explainable
Question Answering [83.74210749046551]
We propose to leverage question decomposing for heterogeneous knowledge integration.
We propose a novel two-stage XQA framework, Reasoning over Hierarchical Question Decomposition Tree (RoHT)
Experiments on complex QA datasets KQA Pro and Musique show that our framework outperforms SOTA methods significantly.
arXiv Detail & Related papers (2023-05-24T11:45:59Z) - Learn to Explain: Multimodal Reasoning via Thought Chains for Science
Question Answering [124.16250115608604]
We present Science Question Answering (SQA), a new benchmark that consists of 21k multimodal multiple choice questions with a diverse set of science topics and annotations of their answers with corresponding lectures and explanations.
We show that SQA improves the question answering performance by 1.20% in few-shot GPT-3 and 3.99% in fine-tuned UnifiedQA.
Our analysis further shows that language models, similar to humans, benefit from explanations to learn from fewer data and achieve the same performance with just 40% of the data.
arXiv Detail & Related papers (2022-09-20T07:04:24Z) - Abstract, Rationale, Stance: A Joint Model for Scientific Claim
Verification [18.330265729989843]
We propose an approach, named as ARSJoint, that jointly learns the modules for the three tasks with a machine reading comprehension framework.
The experimental results on the benchmark dataset SciFact show that our approach outperforms the existing works.
arXiv Detail & Related papers (2021-09-13T10:07:26Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - What's New? Summarizing Contributions in Scientific Literature [85.95906677964815]
We introduce a new task of disentangled paper summarization, which seeks to generate separate summaries for the paper contributions and the context of the work.
We extend the S2ORC corpus of academic articles by adding disentangled "contribution" and "context" reference labels.
We propose a comprehensive automatic evaluation protocol which reports the relevance, novelty, and disentanglement of generated outputs.
arXiv Detail & Related papers (2020-11-06T02:23:01Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z) - CAiRE-COVID: A Question Answering and Query-focused Multi-Document
Summarization System for COVID-19 Scholarly Information Management [48.251211691263514]
We present CAiRE-COVID, a real-time question answering (QA) and multi-document summarization system, which won one of the 10 tasks in the Kaggle COVID-19 Open Research dataset Challenge.
Our system aims to tackle the recent challenge of mining the numerous scientific articles being published on COVID-19 by answering high priority questions from the community.
arXiv Detail & Related papers (2020-05-04T15:07:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.