Related papers: Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs

Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs

URL: http://arxiv.org/abs/2008.13546v1
Date: Tue, 4 Aug 2020 18:20:04 GMT
Title: Effective Transfer Learning for Identifying Similar Questions: Matching User Questions to COVID-19 FAQs
Authors: Clara H. McCreery, Namit Katariya, Anitha Kannan, Manish Chablani, Xavier Amatriain
Abstract summary: We show how a double fine-tuning approach of pretraining a neural network on medical question-answer pairs is a useful intermediate task for determining medical question similarity. We also describe a currently live system that uses the trained model to match user questions to COVID-related FAQ.
Score: 5.512295869673147
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: People increasingly search online for answers to their medical questions but the rate at which medical questions are asked online significantly exceeds the capacity of qualified people to answer them. This leaves many questions unanswered or inadequately answered. Many of these questions are not unique, and reliable identification of similar questions would enable more efficient and effective question answering schema. COVID-19 has only exacerbated this problem. Almost every government agency and healthcare organization has tried to meet the informational need of users by building online FAQs, but there is no way for people to ask their question and know if it is answered on one of these pages. While many research efforts have focused on the problem of general question similarity, these approaches do not generalize well to domains that require expert knowledge to determine semantic similarity, such as the medical domain. In this paper, we show how a double fine-tuning approach of pretraining a neural network on medical question-answer pairs followed by fine-tuning on medical question-question pairs is a particularly useful intermediate task for the ultimate goal of determining medical question similarity. While other pretraining tasks yield an accuracy below 78.7% on this task, our model achieves an accuracy of 82.6% with the same number of training examples, an accuracy of 80.0% with a much smaller training set, and an accuracy of 84.5% when the full corpus of medical question-answer data is used. We also describe a currently live system that uses the trained model to match user questions to COVID-related FAQs.

Related papers

Structured Outputs Enable General-Purpose LLMs to be Medical Experts [50.02627258858336]
Large language models (LLMs) often struggle with open-ended medical questions. We propose a novel approach utilizing structured medical reasoning. Our approach achieves the highest Factuality Score of 85.8, surpassing fine-tuned models.
arXiv Detail & Related papers (2025-03-05T05:24:55Z)
RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions [3.182594503527438]
We present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM. We show that the LLM is more cost-efficient for generating "ideal" QA pairs.
arXiv Detail & Related papers (2024-08-16T09:32:43Z)
Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models [73.79091519226026]
Uncertainty of Thoughts (UoT) is an algorithm to augment large language models with the ability to actively seek information by asking effective questions. In experiments on medical diagnosis, troubleshooting, and the 20 Questions game, UoT achieves an average performance improvement of 38.1% in the rate of successful task completion.
arXiv Detail & Related papers (2024-02-05T18:28:44Z)
Answering Ambiguous Questions with a Database of Questions, Answers, and Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia. Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z)
Top K Relevant Passage Retrieval for Biomedical Question Answering [1.0636004442689055]
Question answering is a task that answers factoid questions using a large collection of documents. The existing Dense Passage Retrieval model has been trained on Wikipedia dump from Dec. 20, 2018, as the source documents for answering questions. In this work, we work on the existing DPR framework for the biomedical domain and retrieve answers from the Pubmed articles which is a reliable source to answer medical questions.
arXiv Detail & Related papers (2023-08-08T04:06:11Z)
Medical Question Understanding and Answering with Knowledge Grounding and Semantic Self-Supervision [53.692793122749414]
We introduce a medical question understanding and answering system with knowledge grounding and semantic self-supervision. Our system is a pipeline that first summarizes a long, medical, user-written question, using a supervised summarization loss. The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.
arXiv Detail & Related papers (2022-09-30T08:20:32Z)
A Dataset of Information-Seeking Questions and Answers Anchored in Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers. Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text. We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z)
Retrieving and ranking short medical questions with two stages neural matching model [3.8020157990268206]
80 percent of internet users have asked health-related questions online. Those representative questions and answers in medical fields are valuable raw data sources for medical data mining. We propose a novel two-stage framework for the semantic matching of query-level medical questions.
arXiv Detail & Related papers (2020-11-16T07:00:35Z)
Where's the Question? A Multi-channel Deep Convolutional Neural Network for Question Identification in Textual Data [83.89578557287658]
We propose a novel multi-channel deep convolutional neural network architecture, namely Quest-CNN, for the purpose of separating real questions. We conducted a comprehensive performance comparison analysis of the proposed network against other deep neural networks. The proposed Quest-CNN achieved the best F1 score both on a dataset of data entry-review dialogue in a dialysis care setting, and on a general domain dataset.
arXiv Detail & Related papers (2020-10-15T15:11:22Z)
Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam. These questions are the most challenging for current QA systems. We present a Multi-step reasoning with Knowledge extraction framework (MurKe) We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z)
A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19 [4.676651062800037]
COVID-19 has caused more than 7.4 million cases and over 418,000 deaths. Online communities, forums, and social media provide potential venues to search for relevant questions and answers. We propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses.
arXiv Detail & Related papers (2020-06-19T05:13:57Z)
Automated Question Answer medical model based on Deep Learning Technology [0.43748379918040853]
This research will train an end-to-end model using the framework of RNN and the encoder-decoder to generate sensible and useful answers to a small set of medical/health issues. The proposed model was trained and evaluated using data from various online services, such as WebMD, HealthTap, eHealthForums, and iCliniq.
arXiv Detail & Related papers (2020-05-21T01:40:01Z)

This list is automatically generated from the titles and abstracts of the papers in this site.