What Disease does this Patient Have? A Large-scale Open Domain Question
Answering Dataset from Medical Exams
- URL: http://arxiv.org/abs/2009.13081v1
- Date: Mon, 28 Sep 2020 05:07:51 GMT
- Title: What Disease does this Patient Have? A Large-scale Open Domain Question
Answering Dataset from Medical Exams
- Authors: Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang and
Peter Szolovits
- Abstract summary: We present the first free-form multiple-choice OpenQA dataset for solving medical problems, MedQA, collected from the professional medical board exams.
It covers three languages: English, simplified Chinese, and traditional Chinese, and contains 12,723, 34,251, and 14,123 questions for the three languages, respectively.
- Score: 35.644831813174974
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Open domain question answering (OpenQA) tasks have been recently attracting
more and more attention from the natural language processing (NLP) community.
In this work, we present the first free-form multiple-choice OpenQA dataset for
solving medical problems, MedQA, collected from the professional medical board
exams. It covers three languages: English, simplified Chinese, and traditional
Chinese, and contains 12,723, 34,251, and 14,123 questions for the three
languages, respectively. We implement both rule-based and popular neural
methods by sequentially combining a document retriever and a machine
comprehension model. Through experiments, we find that even the current best
method can only achieve 36.7\%, 42.0\%, and 70.1\% of test accuracy on the
English, traditional Chinese, and simplified Chinese questions, respectively.
We expect MedQA to present great challenges to existing OpenQA systems and hope
that it can serve as a platform to promote much stronger OpenQA models from the
NLP community in the future.
Related papers
- MediFact at MEDIQA-M3G 2024: Medical Question Answering in Dermatology with Multimodal Learning [0.0]
This paper addresses the limitations of traditional methods by proposing a weakly supervised learning approach for open-ended medical question-answering (QA)
Our system leverages readily available MEDIQA-M3G images via a VGG16-CNN-SVM model, enabling multilingual learning of informative skin condition representations.
This work advances medical QA research, paving the way for clinical decision support systems and ultimately improving healthcare delivery.
arXiv Detail & Related papers (2024-04-27T20:03:47Z) - Can a Multichoice Dataset be Repurposed for Extractive Question Answering? [52.28197971066953]
We repurposed the Belebele dataset (Bandarkar et al., 2023), which was designed for multiple-choice question answering (MCQA)
We present annotation guidelines and a parallel EQA dataset for English and Modern Standard Arabic (MSA).
Our aim is to enable others to adapt our approach for the 120+ other language variants in Belebele, many of which are deemed under-resourced.
arXiv Detail & Related papers (2024-04-26T11:46:05Z) - Building Efficient and Effective OpenQA Systems for Low-Resource Languages [17.64851283209797]
We show that effective, low-cost OpenQA systems can be developed for low-resource contexts.
Key ingredients are weak supervision using machine-translated labeled datasets and a relevant unstructured knowledge source.
We present SQuAD-TR, a machine translation of SQuAD2.0, and we build our OpenQA system by adapting ColBERT-QA and retraining it over Turkish resources.
arXiv Detail & Related papers (2024-01-07T22:11:36Z) - AfriQA: Cross-lingual Open-Retrieval Question Answering for African
Languages [18.689806554953236]
Cross-lingual open-retrieval question answering (XOR QA) systems retrieve answer content from other languages while serving people in their native language.
We create AfriQA, the first cross-lingual QA dataset with a focus on African languages.
AfriQA includes 12,000+ XOR QA examples across 10 African languages.
arXiv Detail & Related papers (2023-05-11T15:34:53Z) - Open-Ended Medical Visual Question Answering Through Prefix Tuning of
Language Models [42.360431316298204]
We focus on open-ended VQA and motivated by the recent advances in language models consider it as a generative task.
To properly communicate the medical images to the language model, we develop a network that maps the extracted visual features to a set of learnable tokens.
We evaluate our approach on the prime medical VQA benchmarks, namely, Slake, OVQA and PathVQA.
arXiv Detail & Related papers (2023-03-10T15:17:22Z) - MaXM: Towards Multilingual Visual Question Answering [28.268881608141303]
We propose scalable solutions to multilingual visual question answering (mVQA) on both data and modeling fronts.
We first propose a translation-based framework to mVQA data generation that requires much less human annotation efforts than the conventional approach of directly collection questions and answers.
Then, we apply our framework to the multilingual captions in the Crossmodal-3600 dataset and develop an efficient annotation protocol to create MaXM, a test-only VQA benchmark in 7 diverse languages.
arXiv Detail & Related papers (2022-09-12T16:53:37Z) - Multifaceted Improvements for Conversational Open-Domain Question
Answering [54.913313912927045]
We propose a framework with Multifaceted Improvements for Conversational open-domain Question Answering (MICQA)
Firstly, the proposed KL-divergence based regularization is able to lead to a better question understanding for retrieval and answer reading.
Second, the added post-ranker module can push more relevant passages to the top placements and be selected for reader with a two-aspect constrains.
Third, the well designed curriculum learning strategy effectively narrows the gap between the golden passage settings of training and inference, and encourages the reader to find true answer without the golden passage assistance.
arXiv Detail & Related papers (2022-04-01T07:54:27Z) - Cross-Lingual GenQA: A Language-Agnostic Generative Question Answering
Approach for Open-Domain Question Answering [76.99585451345702]
Open-Retrieval Generative Question Answering (GenQA) is proven to deliver high-quality, natural-sounding answers in English.
We present the first generalization of the GenQA approach for the multilingual environment.
arXiv Detail & Related papers (2021-10-14T04:36:29Z) - XOR QA: Cross-lingual Open-Retrieval Question Answering [75.20578121267411]
This work extends open-retrieval question answering to a cross-lingual setting.
We construct a large-scale dataset built on questions lacking same-language answers.
arXiv Detail & Related papers (2020-10-22T16:47:17Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.