XAIQA: Explainer-Based Data Augmentation for Extractive Question
Answering
- URL: http://arxiv.org/abs/2312.03567v1
- Date: Wed, 6 Dec 2023 15:59:06 GMT
- Title: XAIQA: Explainer-Based Data Augmentation for Extractive Question
Answering
- Authors: Joel Stremmel, Ardavan Saeedi, Hamid Hassanzadeh, Sanjit Batra,
Jeffrey Hertzberg, Jaime Murillo, Eran Halperin
- Abstract summary: We introduce a novel approach, XAIQA, for generating synthetic QA pairs at scale from data naturally available in electronic health records.
Our method uses the idea of a classification model explainer to generate questions and answers about medical concepts corresponding to medical codes.
- Score: 1.1867812760085572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Extractive question answering (QA) systems can enable physicians and
researchers to query medical records, a foundational capability for designing
clinical studies and understanding patient medical history. However, building
these systems typically requires expert-annotated QA pairs. Large language
models (LLMs), which can perform extractive QA, depend on high quality data in
their prompts, specialized for the application domain. We introduce a novel
approach, XAIQA, for generating synthetic QA pairs at scale from data naturally
available in electronic health records. Our method uses the idea of a
classification model explainer to generate questions and answers about medical
concepts corresponding to medical codes. In an expert evaluation with two
physicians, our method identifies $2.2\times$ more semantic matches and
$3.8\times$ more clinical abbreviations than two popular approaches that use
sentence transformers to create QA pairs. In an ML evaluation, adding our QA
pairs improves performance of GPT-4 as an extractive QA model, including on
difficult questions. In both the expert and ML evaluations, we examine
trade-offs between our method and sentence transformers for QA pair generation
depending on question difficulty.
Related papers
- RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions [3.182594503527438]
We present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM.
We show that the LLM is more cost-efficient for generating "ideal" QA pairs.
arXiv Detail & Related papers (2024-08-16T09:32:43Z) - A Joint-Reasoning based Disease Q&A System [6.117758142183177]
Medical question answer (QA) assistants respond to lay users' health-related queries by synthesizing information from multiple sources.
They can serve as vital tools to alleviate issues of misinformation, information overload, and complexity of medical language.
arXiv Detail & Related papers (2024-01-06T09:55:22Z) - SQUARE: Automatic Question Answering Evaluation using Multiple Positive
and Negative References [73.67707138779245]
We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation)
We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems.
arXiv Detail & Related papers (2023-09-21T16:51:30Z) - Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder [39.06513668037645]
We propose a new Transformer based framework for medical VQA (named as Q2ATransformer)
We introduce an additional Transformer decoder with a set of learnable candidate answer embeddings to query the existence of each answer class to a given image-question pair.
Our method achieves new state-of-the-art performance on two medical VQA benchmarks.
arXiv Detail & Related papers (2023-04-04T08:06:40Z) - Relation-Guided Pre-Training for Open-Domain Question Answering [67.86958978322188]
We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions.
We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
arXiv Detail & Related papers (2021-09-21T17:59:31Z) - Improving Unsupervised Question Answering via Summarization-Informed
Question Generation [47.96911338198302]
Question Generation (QG) is the task of generating a plausible question for a passage, answer> pair.
We make use of freely available news summary data, transforming declarative sentences into appropriate questions using dependency parsing, named entity recognition and semantic role labeling.
The resulting questions are then combined with the original news articles to train an end-to-end neural QG model.
arXiv Detail & Related papers (2021-09-16T13:08:43Z) - CliniQG4QA: Generating Diverse Questions for Domain Adaptation of
Clinical Question Answering [27.45623324582005]
Clinical question answering (QA) aims to automatically answer questions from medical professionals based on clinical texts.
We propose CliniQG4QA, which leverages question generation (QG) to synthesize QA pairs on new clinical contexts.
In order to generate diverse types of questions that are essential for training QA models, we introduce a seq2seq-based question phrase prediction (QPP) module.
arXiv Detail & Related papers (2020-10-30T02:06:10Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z) - Harvesting and Refining Question-Answer Pairs for Unsupervised QA [95.9105154311491]
We introduce two approaches to improve unsupervised Question Answering (QA)
First, we harvest lexically and syntactically divergent questions from Wikipedia to automatically construct a corpus of question-answer pairs (named as RefQA)
Second, we take advantage of the QA model to extract more appropriate answers, which iteratively refines data over RefQA.
arXiv Detail & Related papers (2020-05-06T15:56:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.