Effective Transfer Learning for Identifying Similar Questions: Matching
User Questions to COVID-19 FAQs
- URL: http://arxiv.org/abs/2008.13546v1
- Date: Tue, 4 Aug 2020 18:20:04 GMT
- Title: Effective Transfer Learning for Identifying Similar Questions: Matching
User Questions to COVID-19 FAQs
- Authors: Clara H. McCreery, Namit Katariya, Anitha Kannan, Manish Chablani,
Xavier Amatriain
- Abstract summary: We show how a double fine-tuning approach of pretraining a neural network on medical question-answer pairs is a useful intermediate task for determining medical question similarity.
We also describe a currently live system that uses the trained model to match user questions to COVID-related FAQ.
- Score: 5.512295869673147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: People increasingly search online for answers to their medical questions but
the rate at which medical questions are asked online significantly exceeds the
capacity of qualified people to answer them. This leaves many questions
unanswered or inadequately answered. Many of these questions are not unique,
and reliable identification of similar questions would enable more efficient
and effective question answering schema. COVID-19 has only exacerbated this
problem. Almost every government agency and healthcare organization has tried
to meet the informational need of users by building online FAQs, but there is
no way for people to ask their question and know if it is answered on one of
these pages. While many research efforts have focused on the problem of general
question similarity, these approaches do not generalize well to domains that
require expert knowledge to determine semantic similarity, such as the medical
domain. In this paper, we show how a double fine-tuning approach of pretraining
a neural network on medical question-answer pairs followed by fine-tuning on
medical question-question pairs is a particularly useful intermediate task for
the ultimate goal of determining medical question similarity. While other
pretraining tasks yield an accuracy below 78.7% on this task, our model
achieves an accuracy of 82.6% with the same number of training examples, an
accuracy of 80.0% with a much smaller training set, and an accuracy of 84.5%
when the full corpus of medical question-answer data is used. We also describe
a currently live system that uses the trained model to match user questions to
COVID-related FAQs.
Related papers
- RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions [3.182594503527438]
We present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM.
We show that the LLM is more cost-efficient for generating "ideal" QA pairs.
arXiv Detail & Related papers (2024-08-16T09:32:43Z) - Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models [73.79091519226026]
Uncertainty of Thoughts (UoT) is an algorithm to augment large language models with the ability to actively seek information by asking effective questions.
In experiments on medical diagnosis, troubleshooting, and the 20 Questions game, UoT achieves an average performance improvement of 38.1% in the rate of successful task completion.
arXiv Detail & Related papers (2024-02-05T18:28:44Z) - Answering Ambiguous Questions with a Database of Questions, Answers, and
Revisions [95.92276099234344]
We present a new state-of-the-art for answering ambiguous questions that exploits a database of unambiguous questions generated from Wikipedia.
Our method improves performance by 15% on recall measures and 10% on measures which evaluate disambiguating questions from predicted outputs.
arXiv Detail & Related papers (2023-08-16T20:23:16Z) - Top K Relevant Passage Retrieval for Biomedical Question Answering [1.0636004442689055]
Question answering is a task that answers factoid questions using a large collection of documents.
The existing Dense Passage Retrieval model has been trained on Wikipedia dump from Dec. 20, 2018, as the source documents for answering questions.
In this work, we work on the existing DPR framework for the biomedical domain and retrieve answers from the Pubmed articles which is a reliable source to answer medical questions.
arXiv Detail & Related papers (2023-08-08T04:06:11Z) - Medical Question Understanding and Answering with Knowledge Grounding
and Semantic Self-Supervision [53.692793122749414]
We introduce a medical question understanding and answering system with knowledge grounding and semantic self-supervision.
Our system is a pipeline that first summarizes a long, medical, user-written question, using a supervised summarization loss.
The system first matches the summarized user question with an FAQ from a trusted medical knowledge base, and then retrieves a fixed number of relevant sentences from the corresponding answer document.
arXiv Detail & Related papers (2022-09-30T08:20:32Z) - A Dataset of Information-Seeking Questions and Answers Anchored in
Research Papers [66.11048565324468]
We present a dataset of 5,049 questions over 1,585 Natural Language Processing papers.
Each question is written by an NLP practitioner who read only the title and abstract of the corresponding paper, and the question seeks information present in the full text.
We find that existing models that do well on other QA tasks do not perform well on answering these questions, underperforming humans by at least 27 F1 points when answering them from entire papers.
arXiv Detail & Related papers (2021-05-07T00:12:34Z) - Retrieving and ranking short medical questions with two stages neural
matching model [3.8020157990268206]
80 percent of internet users have asked health-related questions online.
Those representative questions and answers in medical fields are valuable raw data sources for medical data mining.
We propose a novel two-stage framework for the semantic matching of query-level medical questions.
arXiv Detail & Related papers (2020-11-16T07:00:35Z) - Where's the Question? A Multi-channel Deep Convolutional Neural Network
for Question Identification in Textual Data [83.89578557287658]
We propose a novel multi-channel deep convolutional neural network architecture, namely Quest-CNN, for the purpose of separating real questions.
We conducted a comprehensive performance comparison analysis of the proposed network against other deep neural networks.
The proposed Quest-CNN achieved the best F1 score both on a dataset of data entry-review dialogue in a dialysis care setting, and on a general domain dataset.
arXiv Detail & Related papers (2020-10-15T15:11:22Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z) - A Qualitative Evaluation of Language Models on Automatic
Question-Answering for COVID-19 [4.676651062800037]
COVID-19 has caused more than 7.4 million cases and over 418,000 deaths.
Online communities, forums, and social media provide potential venues to search for relevant questions and answers.
We propose to apply a language model for automatically answering questions related to COVID-19 and qualitatively evaluate the generated responses.
arXiv Detail & Related papers (2020-06-19T05:13:57Z) - Automated Question Answer medical model based on Deep Learning
Technology [0.43748379918040853]
This research will train an end-to-end model using the framework of RNN and the encoder-decoder to generate sensible and useful answers to a small set of medical/health issues.
The proposed model was trained and evaluated using data from various online services, such as WebMD, HealthTap, eHealthForums, and iCliniq.
arXiv Detail & Related papers (2020-05-21T01:40:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.