RxWhyQA: a clinical question-answering dataset with the challenge of
multi-answer questions
- URL: http://arxiv.org/abs/2201.02517v1
- Date: Fri, 7 Jan 2022 15:58:58 GMT
- Title: RxWhyQA: a clinical question-answering dataset with the challenge of
multi-answer questions
- Authors: Sungrim Moon, Huan He, Hongfang Liu, Jungwei W. Fan
- Abstract summary: We create a dataset for the development and evaluation of clinical question-answering systems that can handle multi-answer questions.
The 1-to-0 and 1-to-N drug-reason relations formed the unanswerable and multi-answer entries.
- Score: 4.017119245460155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Objectives Create a dataset for the development and evaluation of clinical
question-answering (QA) systems that can handle multi-answer questions.
Materials and Methods We leveraged the annotated relations from the 2018
National NLP Clinical Challenges (n2c2) corpus to generate a QA dataset. The
1-to-0 and 1-to-N drug-reason relations formed the unanswerable and
multi-answer entries, which represent challenging scenarios lacking in the
existing clinical QA datasets. Results The result RxWhyQA dataset contains
91,440 QA entries, of which half are unanswerable, and 21% (n=19,269) of the
answerable ones require multiple answers. The dataset conforms to the
community-vetted Stanford Question Answering Dataset (SQuAD) format. Discussion
The RxWhyQA is useful for comparing different systems that need to handle the
zero- and multi-answer challenges, demanding dual mitigation of both false
positive and false negative answers. Conclusion We created and shared a
clinical QA dataset with a focus on multi-answer questions to represent
real-world scenarios.
Related papers
- RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions [3.182594503527438]
We present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM.
We show that the LLM is more cost-efficient for generating "ideal" QA pairs.
arXiv Detail & Related papers (2024-08-16T09:32:43Z) - A Dataset of Open-Domain Question Answering with Multiple-Span Answers [11.291635421662338]
Multi-span answer extraction, also known as the task of multi-span question answering (MSQA), is critical for real-world applications.
There is a notable lack of publicly available MSQA benchmark in Chinese.
We present CLEAN, a comprehensive Chinese multi-span question answering dataset.
arXiv Detail & Related papers (2024-02-15T13:03:57Z) - Diversity Enhanced Narrative Question Generation for Storybooks [4.043005183192124]
We introduce a multi-question generation model (mQG) capable of generating multiple, diverse, and answerable questions.
To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model.
mQG shows promising results across various evaluation metrics, among strong baselines.
arXiv Detail & Related papers (2023-10-25T08:10:04Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - Modern Question Answering Datasets and Benchmarks: A Survey [5.026863544662493]
Question Answering (QA) is one of the most important natural language processing (NLP) tasks.
It aims using NLP technologies to generate a corresponding answer to a given question based on the massive unstructured corpus.
In this paper, we investigate influential QA datasets that have been released in the era of deep learning.
arXiv Detail & Related papers (2022-06-30T05:53:56Z) - Learning to Ask Like a Physician [24.15961995052862]
We present Discharge Summary Clinical Questions (DiSCQ), a newly curated question dataset composed of 2,000+ questions.
The questions are generated by medical experts from 100+ MIMIC-III discharge summaries.
We analyze this dataset to characterize the types of information sought by medical experts.
arXiv Detail & Related papers (2022-06-06T15:50:54Z) - ConditionalQA: A Complex Reading Comprehension Dataset with Conditional
Answers [93.55268936974971]
We describe a Question Answering dataset that contains complex questions with conditional answers.
We call this dataset ConditionalQA.
We show that ConditionalQA is challenging for many of the existing QA models, especially in selecting answer conditions.
arXiv Detail & Related papers (2021-10-13T17:16:46Z) - Relation-Guided Pre-Training for Open-Domain Question Answering [67.86958978322188]
We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions.
We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
arXiv Detail & Related papers (2021-09-21T17:59:31Z) - Generating Diverse and Consistent QA pairs from Contexts with
Information-Maximizing Hierarchical Conditional VAEs [62.71505254770827]
We propose a conditional variational autoencoder (HCVAE) for generating QA pairs given unstructured texts as contexts.
Our model obtains impressive performance gains over all baselines on both tasks, using only a fraction of data for training.
arXiv Detail & Related papers (2020-05-28T08:26:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.