Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset
- URL: http://arxiv.org/abs/2005.00574v1
- Date: Fri, 1 May 2020 19:07:33 GMT
- Title: Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset
- Authors: Xiang Yue, Bernal Jimenez Gutierrez and Huan Sun
- Abstract summary: We provide an in-depth analysis of emrQA, the first large-scale dataset for question answering (QA) based on clinical notes.
We find that (i) emrQA answers are often incomplete, and (ii) emrQA questions are often answerable without using domain knowledge.
- Score: 29.866478682797513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Machine reading comprehension has made great progress in recent years owing
to large-scale annotated datasets. In the clinical domain, however, creating
such datasets is quite difficult due to the domain expertise required for
annotation. Recently, Pampari et al. (EMNLP'18) tackled this issue by using
expert-annotated question templates and existing i2b2 annotations to create
emrQA, the first large-scale dataset for question answering (QA) based on
clinical notes. In this paper, we provide an in-depth analysis of this dataset
and the clinical reading comprehension (CliniRC) task. From our qualitative
analysis, we find that (i) emrQA answers are often incomplete, and (ii) emrQA
questions are often answerable without using domain knowledge. From our
quantitative experiments, surprising results include that (iii) using a small
sampled subset (5%-20%), we can obtain roughly equal performance compared to
the model trained on the entire dataset, (iv) this performance is close to
human expert's performance, and (v) BERT models do not beat the best performing
base model. Following our analysis of the emrQA, we further explore two desired
aspects of CliniRC systems: the ability to utilize clinical domain knowledge
and to generalize to unseen questions and contexts. We argue that both should
be considered when creating future datasets.
Related papers
- CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation [51.2289822267563]
We propose Corpus Retrieval and Augmentation for Fine-Tuning (CRAFT), a method for generating synthetic datasets.
We use large-scale public web-crawled corpora and similarity-based document retrieval to find other relevant human-written documents.
We demonstrate that CRAFT can efficiently generate large-scale task-specific training datasets for four diverse tasks.
arXiv Detail & Related papers (2024-09-03T17:54:40Z) - Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation [65.16137964758612]
We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books.
Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text.
arXiv Detail & Related papers (2024-05-31T20:15:10Z) - Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering [25.577314828249897]
We propose a novel dataset, MUSIC-AVQA-R, crafted in two steps: rephrasing questions within the test split of a public dataset (MUSIC-AVQA) and introducing distribution shifts to split questions.
Experimental results show that this architecture achieves state-of-the-art performance on MUSIC-AVQA-R, notably obtaining a significant improvement of 9.32%.
arXiv Detail & Related papers (2024-04-18T09:16:02Z) - DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation [83.30006900263744]
Data analysis is a crucial analytical process to generate in-depth studies and conclusive insights.
We propose to automatically generate high-quality answer annotations leveraging the code-generation capabilities of LLMs.
Our DACO-RL algorithm is evaluated by human annotators to produce more helpful answers than SFT model in 57.72% cases.
arXiv Detail & Related papers (2024-03-04T22:47:58Z) - Automatic Question-Answer Generation for Long-Tail Knowledge [65.11554185687258]
We propose an automatic approach to generate specialized QA datasets for tail entities.
We conduct extensive experiments by employing pretrained LLMs on our newly generated long-tail QA datasets.
arXiv Detail & Related papers (2024-03-03T03:06:31Z) - Using Weak Supervision and Data Augmentation in Question Answering [0.12499537119440242]
The onset of the COVID-19 pandemic accentuated the need for access to biomedical literature to answer timely and disease-specific questions.
We explore the roles weak supervision and data augmentation play in training deep neural network QA models.
We evaluate our methods in the context of QA models at the core of a system to answer questions about COVID-19.
arXiv Detail & Related papers (2023-09-28T05:16:51Z) - A quantitative study of NLP approaches to question difficulty estimation [0.30458514384586394]
This work quantitatively analyzes several approaches proposed in previous research, and comparing their performance on datasets from different educational domains.
We find that Transformer based models are the best performing across different educational domains, with DistilBERT performing almost as well as BERT.
As for the other models, the hybrid ones often outperform the ones based on a single type of features, the ones based on linguistic features perform well on reading comprehension questions, while frequency based features (TF-IDF) and word embeddings (word2vec) perform better in domain knowledge assessment.
arXiv Detail & Related papers (2023-05-17T14:26:00Z) - Huatuo-26M, a Large-scale Chinese Medical QA Dataset [29.130166934474044]
In this paper, we release a largest ever medical Question Answering (QA) dataset with 26 million QA pairs.
We benchmark many existing approaches in our dataset in terms of both retrieval and generation.
We believe that this dataset will not only contribute to medical research but also facilitate both the patients and clinical doctors.
arXiv Detail & Related papers (2023-05-02T15:33:01Z) - PAXQA: Generating Cross-lingual Question Answering Examples at Training
Scale [53.92008514395125]
PAXQA (Projecting annotations for cross-lingual (x) QA) decomposes cross-lingual QA into two stages.
We propose a novel use of lexically-constrained machine translation, in which constrained entities are extracted from the parallel bitexts.
We show that models fine-tuned on these datasets outperform prior synthetic data generation models over several extractive QA datasets.
arXiv Detail & Related papers (2023-04-24T15:46:26Z) - Intermediate Training on Question Answering Datasets Improves Generative
Data Augmentation [32.83012699501051]
We improve generative data augmentation by formulating the data generation as context generation task.
We cast downstream tasks into question answering format and adapt the fine-tuned context generators to the target task domain.
We demonstrate substantial improvements in performance in few-shot, zero-shot settings.
arXiv Detail & Related papers (2022-05-25T09:28:21Z) - Question Answering Infused Pre-training of General-Purpose
Contextualized Representations [70.62967781515127]
We propose a pre-training objective based on question answering (QA) for learning general-purpose contextual representations.
We accomplish this goal by training a bi-encoder QA model, which independently encodes passages and questions, to match the predictions of a more accurate cross-encoder model.
We show large improvements over both RoBERTa-large and previous state-of-the-art results on zero-shot and few-shot paraphrase detection.
arXiv Detail & Related papers (2021-06-15T14:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.