Huatuo-26M, a Large-scale Chinese Medical QA Dataset
- URL: http://arxiv.org/abs/2305.01526v1
- Date: Tue, 2 May 2023 15:33:01 GMT
- Title: Huatuo-26M, a Large-scale Chinese Medical QA Dataset
- Authors: Jianquan Li, Xidong Wang, Xiangbo Wu, Zhiyi Zhang, Xiaolong Xu, Jie
Fu, Prayag Tiwari, Xiang Wan, Benyou Wang
- Abstract summary: In this paper, we release a largest ever medical Question Answering (QA) dataset with 26 million QA pairs.
We benchmark many existing approaches in our dataset in terms of both retrieval and generation.
We believe that this dataset will not only contribute to medical research but also facilitate both the patients and clinical doctors.
- Score: 29.130166934474044
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we release a largest ever medical Question Answering (QA)
dataset with 26 million QA pairs. We benchmark many existing approaches in our
dataset in terms of both retrieval and generation. Experimental results show
that the existing models perform far lower than expected and the released
dataset is still challenging in the pre-trained language model era. Moreover,
we also experimentally show the benefit of the proposed dataset in many
aspects: (i) trained models for other QA datasets in a zero-shot fashion; and
(ii) as external knowledge for retrieval-augmented generation (RAG); and (iii)
improving existing pre-trained language models by using the QA pairs as a
pre-training corpus in continued training manner. We believe that this dataset
will not only contribute to medical research but also facilitate both the
patients and clinical doctors. See
\url{https://github.com/FreedomIntelligence/Huatuo-26M}.
Related papers
- RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models [35.60385437194243]
Current Medical Large Vision Language Models (Med-LVLMs) frequently encounter factual issues.
RAG, which utilizes external knowledge, can improve the factual accuracy of these models but introduces two major challenges.
We propose RULE, which consists of two components. First, we introduce a provably effective strategy for controlling factuality risk through the selection of retrieved contexts.
Second, based on samples where over-reliance on retrieved contexts led to errors, we curate a preference dataset to fine-tune the model.
arXiv Detail & Related papers (2024-07-06T16:45:07Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - Recent Advances in Predictive Modeling with Electronic Health Records [71.19967863320647]
utilizing EHR data for predictive modeling presents several challenges due to its unique characteristics.
Deep learning has demonstrated its superiority in various applications, including healthcare.
arXiv Detail & Related papers (2024-02-02T00:31:01Z) - BESTMVQA: A Benchmark Evaluation System for Medical Visual Question
Answering [8.547600133510551]
This paper develops a Benchmark Evaluation SysTem for Medical Visual Question Answering, denoted by BESTMVQA.
Our system provides a useful tool for users to automatically build Med-VQA datasets, which helps overcoming the data insufficient problem.
With simple configurations, our system automatically trains and evaluates the selected models over a benchmark dataset.
arXiv Detail & Related papers (2023-12-13T03:08:48Z) - Question-Answering Model for Schizophrenia Symptoms and Their Impact on
Daily Life using Mental Health Forums Data [0.0]
The Mental Health'' forum was used, a forum dedicated to people suffering from schizophrenia and different mental disorders.
It is shown how to pre-process the dataset to convert it into a QA dataset.
The BiBERT, DistilBERT, RoBERTa, and BioBERT models were fine-tuned and evaluated via F1-Score, Exact Match, Precision and Recall.
arXiv Detail & Related papers (2023-09-30T17:50:50Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Pre-training transformer-based framework on large-scale pediatric claims
data for downstream population-specific tasks [3.1580072841682734]
This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset.
The effective knowledge transfer is completed through the task-aware fine-tuning stage.
We conducted experiments on a real-world claims dataset with more than one million patient records.
arXiv Detail & Related papers (2021-06-24T15:25:41Z) - On the Efficacy of Adversarial Data Collection for Question Answering:
Results from a Large-Scale Randomized Study [65.17429512679695]
In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions.
Despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produces more robust models.
arXiv Detail & Related papers (2021-06-02T00:48:33Z) - Comparing Test Sets with Item Response Theory [53.755064720563]
We evaluate 29 datasets using predictions from 18 pretrained Transformer models on individual test examples.
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models.
We also observe span selection task format, which is used for QA datasets like QAMR or SQuAD2.0, is effective in differentiating between strong and weak models.
arXiv Detail & Related papers (2021-06-01T22:33:53Z) - Predicting Clinical Diagnosis from Patients Electronic Health Records
Using BERT-based Neural Networks [62.9447303059342]
We show the importance of this problem in medical community.
We present a modification of Bidirectional Representations from Transformers (BERT) model for classification sequence.
We use a large-scale Russian EHR dataset consisting of about 4 million unique patient visits.
arXiv Detail & Related papers (2020-07-15T09:22:55Z) - Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset [29.866478682797513]
We provide an in-depth analysis of emrQA, the first large-scale dataset for question answering (QA) based on clinical notes.
We find that (i) emrQA answers are often incomplete, and (ii) emrQA questions are often answerable without using domain knowledge.
arXiv Detail & Related papers (2020-05-01T19:07:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.