Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
- URL: http://arxiv.org/abs/2404.08359v1
- Date: Fri, 12 Apr 2024 09:56:12 GMT
- Title: Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval
- Authors: Juraj Vladika, Florian Matthes,
- Abstract summary: Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large knowledge bases.
By utilizing the common retrieve-then-read QA pipeline and PubMed as a trustworthy collection of medical research documents, we answer health questions from three diverse datasets.
Our results reveal that cutting down on the amount of retrieved documents and favoring more recent and highly cited documents can improve the final macro F1 score up to 10%.
- Score: 5.69361786082969
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In today's digital world, seeking answers to health questions on the Internet is a common practice. However, existing question answering (QA) systems often rely on using pre-selected and annotated evidence documents, thus making them inadequate for addressing novel questions. Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large knowledge bases. By utilizing the common retrieve-then-read QA pipeline and PubMed as a trustworthy collection of medical research documents, we answer health questions from three diverse datasets. We modify different retrieval settings to observe their influence on the QA pipeline's performance, including the number of retrieved documents, sentence selection process, the publication year of articles, and their number of citations. Our results reveal that cutting down on the amount of retrieved documents and favoring more recent and highly cited documents can improve the final macro F1 score up to 10%. We discuss the results, highlight interesting examples, and outline challenges for future research, like managing evidence disagreement and crafting user-friendly explanations.
Related papers
- When to Read Documents or QA History: On Unified and Selective
Open-domain QA [22.941325275188376]
This paper studies the problem of open-domain question answering, with the aim of answering a diverse range of questions leveraging knowledge resources.
Two types of sources, QA-pair and document corpora, have been actively leveraged with the following complementary strength.
A natural follow-up is thus leveraging both models, while a naive pipelining or integration approaches have failed to bring additional gains over either model alone.
arXiv Detail & Related papers (2023-06-07T06:03:39Z) - Detect, Retrieve, Comprehend: A Flexible Framework for Zero-Shot
Document-Level Question Answering [6.224211330728391]
Researchers produce thousands of scholarly documents containing valuable technical knowledge.
Document-level question answering (QA) offers a flexible framework where human-posed questions can be adapted to extract diverse knowledge.
We present a three-stage document QA approach: text extraction from PDF; evidence retrieval from extracted texts to form well-posed contexts; and QA to extract knowledge from contexts to return high-quality answers.
arXiv Detail & Related papers (2022-10-04T23:33:52Z) - RealTime QA: What's the Answer Right Now? [137.04039209995932]
We introduce REALTIME QA, a dynamic question answering (QA) platform that announces questions and evaluates systems on a regular basis.
We build strong baseline models upon large pretrained language models, including GPT-3 and T5.
GPT-3 tends to return outdated answers when retrieved documents do not provide sufficient information to find an answer.
arXiv Detail & Related papers (2022-07-27T07:26:01Z) - Recent Advances in Automated Question Answering In Biomedical Domain [0.06922389632860546]
In the past few decades there has been a proliferation of acquisition of knowledge and consequently there has been an exponential growth in new scientific articles in the field of biomedicine.
It has become difficult to keep track of all the information in the domain, even for domain experts.
With the improvements in commercial search engines, users can type in their queries and get a small set of documents most relevant for answering their query.
This has necessitated the development of efficient QA systems which aim to find exact and precise answers to user provided natural language questions.
arXiv Detail & Related papers (2021-11-10T20:51:29Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - A Graph-guided Multi-round Retrieval Method for Conversational
Open-domain Question Answering [52.041815783025186]
We propose a novel graph-guided retrieval method to model the relations among answers across conversation turns.
We also propose to incorporate the multi-round relevance feedback technique to explore the impact of the retrieval context on current question understanding.
arXiv Detail & Related papers (2021-04-17T04:39:41Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z) - Knowledge-Aided Open-Domain Question Answering [58.712857964048446]
We propose a knowledge-aided open-domain QA (KAQA) method which targets at improving relevant document retrieval and answer reranking.
During document retrieval, a candidate document is scored by considering its relationship to the question and other documents.
During answer reranking, a candidate answer is reranked using not only its own context but also the clues from other documents.
arXiv Detail & Related papers (2020-06-09T13:28:57Z) - CAiRE-COVID: A Question Answering and Query-focused Multi-Document
Summarization System for COVID-19 Scholarly Information Management [48.251211691263514]
We present CAiRE-COVID, a real-time question answering (QA) and multi-document summarization system, which won one of the 10 tasks in the Kaggle COVID-19 Open Research dataset Challenge.
Our system aims to tackle the recent challenge of mining the numerous scientific articles being published on COVID-19 by answering high priority questions from the community.
arXiv Detail & Related papers (2020-05-04T15:07:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.