Assessing and Enhancing Large Language Models in Rare Disease Question-answering
- URL: http://arxiv.org/abs/2408.08422v1
- Date: Thu, 15 Aug 2024 21:09:09 GMT
- Title: Assessing and Enhancing Large Language Models in Rare Disease Question-answering
- Authors: Guanchu Wang, Junhao Ran, Ruixiang Tang, Chia-Yuan Chang, Chia-Yuan Chang, Yu-Neng Chuang, Zirui Liu, Vladimir Braverman, Zhandong Liu, Xia Hu,
- Abstract summary: We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
- Score: 64.32570472692187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Despite the impressive capabilities of Large Language Models (LLMs) in general medical domains, questions remain about their performance in diagnosing rare diseases. To answer this question, we aim to assess the diagnostic performance of LLMs in rare diseases, and explore methods to enhance their effectiveness in this area. In this work, we introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of LLMs in diagnosing rare diseases. Specifically, we collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases. Additionally, we annotated meta-data for each question, facilitating the extraction of subsets specific to any given disease and its property. Based on the ReDis-QA dataset, we benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models. To facilitate retrieval augmentation generation for rare disease diagnosis, we collect the first rare diseases corpus (ReCOP), sourced from the National Organization for Rare Disorders (NORD) database. Specifically, we split the report of each rare disease into multiple chunks, each representing a different property of the disease, including their overview, symptoms, causes, effects, related disorders, diagnosis, and standard therapies. This structure ensures that the information within each chunk aligns consistently with a question. Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%. Moreover, it significantly guides LLMs to generate trustworthy answers and explanations that can be traced back to existing literature.
Related papers
- Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge [0.0]
We present Zebra-Llama, a specialized context-aware language model with high precision Retrieval Augmented Generation (RAG) capability.
We focus on Ehlers-Danlos Syndrome (EDS) as our case study. EDS, affecting 1 in 5,000 individuals, exemplifies the complexities of rare diseases.
Zebra-Llama demonstrates unprecedented capabilities in handling EDS-related queries.
arXiv Detail & Related papers (2024-11-04T22:45:52Z) - MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [49.765466293296186]
Recent progress in Medical Large Vision-Language Models (Med-LVLMs) has opened up new possibilities for interactive diagnostic tools.
Med-LVLMs often suffer from factual hallucination, which can lead to incorrect diagnoses.
We propose a versatile multimodal RAG system, MMed-RAG, designed to enhance the factuality of Med-LVLMs.
arXiv Detail & Related papers (2024-10-16T23:03:27Z) - KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models [39.831976458410864]
This paper presents KARGEN, a Knowledge-enhanced Automated radiology Report GENeration framework based on Large Language Models.
The framework integrates a knowledge graph to unlock chest disease-related knowledge within the LLM to enhance the clinical utility of generated reports.
Our approach demonstrates promising results on the MIMIC-CXR and IU-Xray datasets.
arXiv Detail & Related papers (2024-09-09T06:57:22Z) - AutoRD: An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontologies-enhanced Large Language Models [25.966454809890227]
Rare diseases affect millions worldwide but often face limited research focus due to their low prevalence.
Recent advancements in Large Language Models (LLMs) have shown promise in automating the extraction of medical information.
We propose an end-to-end system called AutoRD, which automates the extraction of information from medical texts about rare diseases.
arXiv Detail & Related papers (2024-03-01T20:06:39Z) - RareBench: Can LLMs Serve as Rare Diseases Specialists? [11.828142771893443]
Generalist Large Language Models (LLMs) have shown considerable promise in various domains, including medical diagnosis.
Rare diseases, affecting approximately 300 million people worldwide, often have unsatisfactory clinical diagnosis rates.
RareBench is a pioneering benchmark designed to evaluate the capabilities of LLMs on 4 critical dimensions within the realm of rare diseases.
We present an exhaustive comparative study of GPT-4's diagnostic capabilities against those of specialist physicians.
arXiv Detail & Related papers (2024-02-09T11:34:16Z) - Deep Reinforcement Learning Framework for Thoracic Diseases
Classification via Prior Knowledge Guidance [49.87607548975686]
The scarcity of labeled data for related diseases poses a huge challenge to an accurate diagnosis.
We propose a novel deep reinforcement learning framework, which introduces prior knowledge to direct the learning of diagnostic agents.
Our approach's performance was demonstrated using the well-known NIHX-ray 14 and CheXpert datasets.
arXiv Detail & Related papers (2023-06-02T01:46:31Z) - Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents.
We generate an automatic tumor boundary detector for the rare disease of glioblastoma.
We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z) - Domain Invariant Model with Graph Convolutional Network for Mammogram
Classification [49.691629817104925]
We propose a novel framework, namely Domain Invariant Model with Graph Convolutional Network (DIM-GCN)
We first propose a Bayesian network, which explicitly decomposes the latent variables into disease-related and other disease-irrelevant parts that are provable to be disentangled from each other.
To better capture the macroscopic features, we leverage the observed clinical attributes as a goal for reconstruction, via Graph Convolutional Network (GCN)
arXiv Detail & Related papers (2022-04-21T08:23:44Z) - Predicting Parkinson's Disease with Multimodal Irregularly Collected
Longitudinal Smartphone Data [75.23250968928578]
Parkinsons Disease is a neurological disorder and prevalent in elderly people.
Traditional ways to diagnose the disease rely on in-person subjective clinical evaluations on the quality of a set of activity tests.
We propose a novel time-series based approach to predicting Parkinson's Disease with raw activity test data collected by smartphones in the wild.
arXiv Detail & Related papers (2020-09-25T01:50:15Z) - Feature Selection on Lyme Disease Patient Survey Data [7.895389437572245]
Lyme disease is a rapidly growing illness that remains poorly understood within the medical community.
We investigate these questions by applying machine learning techniques to a large scale Lyme disease patient registry.
arXiv Detail & Related papers (2020-08-24T22:35:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.