RareBench: Can LLMs Serve as Rare Diseases Specialists?
- URL: http://arxiv.org/abs/2402.06341v2
- Date: Thu, 4 Jul 2024 09:10:17 GMT
- Title: RareBench: Can LLMs Serve as Rare Diseases Specialists?
- Authors: Xuanzhong Chen, Xiaohao Mao, Qihan Guo, Lun Wang, Shuyang Zhang, Ting Chen,
- Abstract summary: Generalist Large Language Models (LLMs) have shown considerable promise in various domains, including medical diagnosis.
Rare diseases, affecting approximately 300 million people worldwide, often have unsatisfactory clinical diagnosis rates.
RareBench is a pioneering benchmark designed to evaluate the capabilities of LLMs on 4 critical dimensions within the realm of rare diseases.
We present an exhaustive comparative study of GPT-4's diagnostic capabilities against those of specialist physicians.
- Score: 11.828142771893443
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generalist Large Language Models (LLMs), such as GPT-4, have shown considerable promise in various domains, including medical diagnosis. Rare diseases, affecting approximately 300 million people worldwide, often have unsatisfactory clinical diagnosis rates primarily due to a lack of experienced physicians and the complexity of differentiating among many rare diseases. In this context, recent news such as "ChatGPT correctly diagnosed a 4-year-old's rare disease after 17 doctors failed" underscore LLMs' potential, yet underexplored, role in clinically diagnosing rare diseases. To bridge this research gap, we introduce RareBench, a pioneering benchmark designed to systematically evaluate the capabilities of LLMs on 4 critical dimensions within the realm of rare diseases. Meanwhile, we have compiled the largest open-source dataset on rare disease patients, establishing a benchmark for future studies in this domain. To facilitate differential diagnosis of rare diseases, we develop a dynamic few-shot prompt methodology, leveraging a comprehensive rare disease knowledge graph synthesized from multiple knowledge bases, significantly enhancing LLMs' diagnostic performance. Moreover, we present an exhaustive comparative study of GPT-4's diagnostic capabilities against those of specialist physicians. Our experimental findings underscore the promising potential of integrating LLMs into the clinical diagnostic process for rare diseases. This paves the way for exciting possibilities in future advancements in this field.
Related papers
- Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - AI-based Anomaly Detection for Clinical-Grade Histopathological Diagnostics [24.833696455985795]
In clinical reality, only few diseases are common, whereas the majority of diseases are less frequent.
Current AI models overlook or misclassify these diseases.
We propose a deep anomaly detection approach that only requires training data from common diseases to detect also all less frequent diseases.
arXiv Detail & Related papers (2024-06-21T04:59:19Z) - Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses [0.2995925627097048]
This study evaluates each model diagnostic abilities by interpreting a user symptoms and determining diagnoses that fit well with common illnesses.
GPT-4 demonstrates higher diagnostic accuracy from its deep and complete history of training on medical data.
Gemini performs with high precision as a critical tool in disease triage, demonstrating its potential to be a reliable model.
arXiv Detail & Related papers (2024-05-09T15:12:24Z) - A Concept-based Interpretable Model for the Diagnosis of Choroid
Neoplasias using Multimodal Data [28.632437578685842]
We focus on choroid neoplasias, the most prevalent form of eye cancer in adults, albeit rare with 5.1 per million.
Our work introduces a concept-based interpretable model that distinguishes between three types of choroidal tumors, integrating insights from domain experts via radiological reports.
Remarkably, this model achieves an F1 score of 0.91, rivaling that of black-box models, but also boosts the diagnostic accuracy of junior doctors by 42%.
arXiv Detail & Related papers (2024-03-08T07:15:53Z) - Asclepius: A Spectrum Evaluation Benchmark for Medical Multi-Modal Large
Language Models [59.60384461302662]
We introduce Asclepius, a novel benchmark for evaluating Medical Multi-Modal Large Language Models (Med-MLLMs)
Asclepius rigorously and comprehensively assesses model capability in terms of distinct medical specialties and different diagnostic capacities.
We also provide an in-depth analysis of 6 Med-MLLMs and compare them with 5 human specialists.
arXiv Detail & Related papers (2024-02-17T08:04:23Z) - Towards long-tailed, multi-label disease classification from chest X-ray: Overview of the CXR-LT challenge [59.323306639144526]
Many real-world image recognition problems, such as diagnostic medical imaging exams, are emerging.
Diagnose is both a long-tailed and multi-label problem, as patients often present with multiple findings.
We synthesize common themes, providing recommendations for long-tailed, multi-label medical image classification.
arXiv Detail & Related papers (2023-10-24T18:26:22Z) - Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for
Multimodal Medical Diagnosis [59.35504779947686]
GPT-4V is OpenAI's newest model for multimodal medical diagnosis.
Our evaluation encompasses 17 human body systems.
GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy.
It faces significant challenges in disease diagnosis and generating comprehensive reports.
arXiv Detail & Related papers (2023-10-15T18:32:27Z) - Expert Uncertainty and Severity Aware Chest X-Ray Classification by
Multi-Relationship Graph Learning [48.29204631769816]
We re-extract disease labels from CXR reports to make them more realistic by considering disease severity and uncertainty in classification.
Our experimental results show that models considering disease severity and uncertainty outperform previous state-of-the-art methods.
arXiv Detail & Related papers (2023-09-06T19:19:41Z) - Evaluate underdiagnosis and overdiagnosis bias of deep learning model on
primary open-angle glaucoma diagnosis in under-served patient populations [64.91773761529183]
Primary open-angle glaucoma (POAG) is the leading cause of blindness in the United States.
Deep learning has been widely used to detect POAG using fundus images.
Human bias in clinical diagnosis may be reflected and amplified in the widely-used deep learning models.
arXiv Detail & Related papers (2023-01-26T18:53:09Z) - Exploring deep learning methods for recognizing rare diseases and their
clinical manifestations from texts [1.6328866317851187]
Approximately 300 million people are affected by a rare disease.
The early and accurate diagnosis of these conditions is a major challenge for general practitioners, who do not have enough knowledge to identify them.
Natural Language Processing (NLP) and Deep Learning can help to extract relevant information to facilitate their diagnosis and treatments.
arXiv Detail & Related papers (2021-09-01T12:35:26Z) - Mining Misdiagnosis Patterns from Biomedical Literature [8.534433954411409]
We find that the most commonly misdiagnosed diseases were often misdiagnosed as many different diseases.
While a misdiagnosis relationship may generally exist, the relationship was often found to be one-sided.
arXiv Detail & Related papers (2020-06-24T13:34:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.