Top K Relevant Passage Retrieval for Biomedical Question Answering
- URL: http://arxiv.org/abs/2308.04028v1
- Date: Tue, 8 Aug 2023 04:06:11 GMT
- Title: Top K Relevant Passage Retrieval for Biomedical Question Answering
- Authors: Shashank Gupta
- Abstract summary: Question answering is a task that answers factoid questions using a large collection of documents.
The existing Dense Passage Retrieval model has been trained on Wikipedia dump from Dec. 20, 2018, as the source documents for answering questions.
In this work, we work on the existing DPR framework for the biomedical domain and retrieve answers from the Pubmed articles which is a reliable source to answer medical questions.
- Score: 1.0636004442689055
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Question answering is a task that answers factoid questions using a large
collection of documents. It aims to provide precise answers in response to the
user's questions in natural language. Question answering relies on efficient
passage retrieval to select candidate contexts, where traditional sparse vector
space models, such as TF-IDF or BM25, are the de facto method. On the web,
there is no single article that could provide all the possible answers
available on the internet to the question of the problem asked by the user. The
existing Dense Passage Retrieval model has been trained on Wikipedia dump from
Dec. 20, 2018, as the source documents for answering questions. Question
answering (QA) has made big strides with several open-domain and machine
comprehension systems built using large-scale annotated datasets. However, in
the clinical domain, this problem remains relatively unexplored. According to
multiple surveys, Biomedical Questions cannot be answered correctly from
Wikipedia Articles. In this work, we work on the existing DPR framework for the
biomedical domain and retrieve answers from the Pubmed articles which is a
reliable source to answer medical questions. When evaluated on a BioASQ QA
dataset, our fine-tuned dense retriever results in a 0.81 F1 score.
Related papers
- Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - Contributions to the Improvement of Question Answering Systems in the
Biomedical Domain [0.951828574518325]
This thesis work falls within the framework of question answering (QA) in the biomedical domain.
We propose four contributions to improve the performance of QA in the biomedical domain.
We develop a fully automated semantic biomedical QA system called SemBioNLQA.
arXiv Detail & Related papers (2023-07-25T16:31:20Z) - CREPE: Open-Domain Question Answering with False Presuppositions [92.20501870319765]
We introduce CREPE, a QA dataset containing a natural distribution of presupposition failures from online information-seeking forums.
We find that 25% of questions contain false presuppositions, and provide annotations for these presuppositions and their corrections.
We show that adaptations of existing open-domain QA models can find presuppositions moderately well, but struggle when predicting whether a presupposition is factually correct.
arXiv Detail & Related papers (2022-11-30T18:54:49Z) - Medical Question Understanding and Answering with Knowledge Grounding
and Semantic Self-Supervision [53.692793122749414]
We introduce a medical question understanding and answering system with knowledge grounding and semantic self-supervision.
Our system is a pipeline that first summarizes a long, medical, user-written question, using a supervised summarization loss.
The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.
arXiv Detail & Related papers (2022-09-30T08:20:32Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - GooAQ: Open Question Answering with Diverse Answer Types [63.06454855313667]
We present GooAQ, a large-scale dataset with a variety of answer types.
This dataset contains over 5 million questions and 3 million answers collected from Google.
arXiv Detail & Related papers (2021-04-18T05:40:39Z) - Sequence Tagging for Biomedical Extractive Question Answering [12.464143741310137]
We investigate the difference of the question distribution across the general and biomedical domains.
We discover biomedical questions are more likely to require list-type answers (multiple answers) than factoid-type answers (single answer)
Our approach can learn to decide the number of answers for a question from training data.
arXiv Detail & Related papers (2021-04-15T15:42:34Z) - Where's the Question? A Multi-channel Deep Convolutional Neural Network
for Question Identification in Textual Data [83.89578557287658]
We propose a novel multi-channel deep convolutional neural network architecture, namely Quest-CNN, for the purpose of separating real questions.
We conducted a comprehensive performance comparison analysis of the proposed network against other deep neural networks.
The proposed Quest-CNN achieved the best F1 score both on a dataset of data entry-review dialogue in a dialysis care setting, and on a general domain dataset.
arXiv Detail & Related papers (2020-10-15T15:11:22Z) - Effective Transfer Learning for Identifying Similar Questions: Matching
User Questions to COVID-19 FAQs [5.512295869673147]
We show how a double fine-tuning approach of pretraining a neural network on medical question-answer pairs is a useful intermediate task for determining medical question similarity.
We also describe a currently live system that uses the trained model to match user questions to COVID-related FAQ.
arXiv Detail & Related papers (2020-08-04T18:20:04Z) - A Qualitative Evaluation of Language Models on Automatic
Question-Answering for COVID-19 [4.676651062800037]
COVID-19 has caused more than 7.4 million cases and over 418,000 deaths.
Online communities, forums, and social media provide potential venues to search for relevant questions and answers.
We propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses.
arXiv Detail & Related papers (2020-06-19T05:13:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.