Related papers: Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease

Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease

URL: http://arxiv.org/abs/2502.15069v1
Date: Thu, 20 Feb 2025 22:02:52 GMT
Title: Rare Disease Differential Diagnosis with Large Language Models at Scale: From Abdominal Actinomycosis to Wilson's Disease
Authors: Elliot Schumacher, Dhruv Naik, Anitha Kannan,
Abstract summary: Large language models (LLMs) have demonstrated impressive capabilities in disease diagnosis.<n>Rare disease performance is critical with the increasing use of LLMs in healthcare settings.<n>We propose RareScale to combine the knowledge LLMs with expert systems.
Score: 8.81420331399616
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have demonstrated impressive capabilities in disease diagnosis. However, their effectiveness in identifying rarer diseases, which are inherently more challenging to diagnose, remains an open question. Rare disease performance is critical with the increasing use of LLMs in healthcare settings. This is especially true if a primary care physician needs to make a rarer prognosis from only a patient conversation so that they can take the appropriate next step. To that end, several clinical decision support systems are designed to support providers in rare disease identification. Yet their utility is limited due to their lack of knowledge of common disorders and difficulty of use. In this paper, we propose RareScale to combine the knowledge LLMs with expert systems. We use jointly use an expert system and LLM to simulate rare disease chats. This data is used to train a rare disease candidate predictor model. Candidates from this smaller model are then used as additional inputs to black-box LLM to make the final differential diagnosis. Thus, RareScale allows for a balance between rare and common diagnoses. We present results on over 575 rare diseases, beginning with Abdominal Actinomycosis and ending with Wilson's Disease. Our approach significantly improves the baseline performance of black-box LLMs by over 17% in Top-5 accuracy. We also find that our candidate generation performance is high (e.g. 88.8% on gpt-4o generated chats).

Related papers

Hide and Seek with LLMs: An Adversarial Game for Sneaky Error Generation and Self-Improving Diagnosis [51.88592148135258]
We propose Hide and Seek Game (HSG), a dynamic adversarial framework for error generation and diagnosis.<n>HSG involves two adversarial roles: Sneaky, which "hides" by generating subtle, deceptive reasoning errors, and Diagnosis, which "seeks" to accurately detect them.<n> Experiments on several math reasoning tasks show that HSG significantly boosts error diagnosis, achieving 16.8%--31.4% higher accuracy than baselines like GPT-4o.
arXiv Detail & Related papers (2025-08-05T12:45:21Z)
An Agentic System for Rare Disease Diagnosis with Traceable Reasoning [58.78045864541539]
We introduce DeepRare, the first rare disease diagnosis agentic system powered by a large language model (LLM)<n>DeepRare generates ranked diagnostic hypotheses for rare diseases, each accompanied by a transparent chain of reasoning.<n>The system demonstrates exceptional diagnostic performance among 2,919 diseases, achieving 100% accuracy for 1013 diseases.
arXiv Detail & Related papers (2025-06-25T13:42:26Z)
Right Prediction, Wrong Reasoning: Uncovering LLM Misalignment in RA Disease Diagnosis [16.057157876625794]
Large language models (LLMs) offer a promising pre-screening tool, improving early disease detection and providing enhanced healthcare access for underprivileged communities. With impressive accuracy in prediction across a range of diseases, LLMs have the potential to revolutionize clinical pre-screening and decision-making for various medical conditions.
arXiv Detail & Related papers (2025-04-09T05:04:01Z)
Survey and Improvement Strategies for Gene Prioritization with Large Language Models [61.24568051916653]
Large language models (LLMs) have performed well in medical exams, but their effectiveness in diagnosing rare genetic diseases has not been assessed. We used multi-agent and Human Phenotype Ontology (HPO) classification to categorized patients based on phenotypes and solvability levels. At baseline, GPT-4 outperformed other LLMs, achieving near 30% accuracy in ranking causal genes correctly.
arXiv Detail & Related papers (2025-01-30T23:03:03Z)
CovidLLM: A Robust Large Language Model with Missing Value Adaptation and Multi-Objective Learning Strategy for Predicting Disease Severity and Clinical Outcomes in COVID-19 Patients [4.063838267166007]
Coronavirus Disease 2019 (COVID-19) has caused millions of deaths worldwide.<n>Early identification of the severity and clinical outcomes of the disease in these patients is vital to prevent adverse prognoses.<n>Our research focuses primarily on constructing specialized prompts and adopting multi-objective learning strategies.
arXiv Detail & Related papers (2024-11-28T11:27:38Z)
Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases. We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases. We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models. Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z)
RareBench: Can LLMs Serve as Rare Diseases Specialists? [11.828142771893443]
Generalist Large Language Models (LLMs) have shown considerable promise in various domains, including medical diagnosis. Rare diseases, affecting approximately 300 million people worldwide, often have unsatisfactory clinical diagnosis rates. RareBench is a pioneering benchmark designed to evaluate the capabilities of LLMs on 4 critical dimensions within the realm of rare diseases. We present an exhaustive comparative study of GPT-4's diagnostic capabilities against those of specialist physicians.
arXiv Detail & Related papers (2024-02-09T11:34:16Z)
Diagnosis Uncertain Models For Medical Risk Prediction [80.07192791931533]
We consider a patient risk model which has access to vital signs, lab values, and prior history but does not have access to a patient's diagnosis. We show that such all-cause' risk models have good generalization across diagnoses but have a predictable failure mode. We propose a fix for this problem by explicitly modeling the uncertainty in risk prediction coming from uncertainty in patient diagnoses.
arXiv Detail & Related papers (2023-06-29T23:36:04Z)
Benchmarking Continuous Time Models for Predicting Multiple Sclerosis Progression [46.394865849252696]
Multiple sclerosis is a disease that affects the brain and spinal cord, it can lead to severe disability and has no known cure. In a recent paper it was shown that disease progression can be predicted effectively using performance outcome measures and demographic data. We benchmark four continuous time models using a publicly available multiple sclerosis dataset. We find that the best continuous model is often able to outperform the best benchmarked discrete time model.
arXiv Detail & Related papers (2023-02-15T18:45:32Z)
Evaluate underdiagnosis and overdiagnosis bias of deep learning model on primary open-angle glaucoma diagnosis in under-served patient populations [64.91773761529183]
Primary open-angle glaucoma (POAG) is the leading cause of blindness in the United States. Deep learning has been widely used to detect POAG using fundus images. Human bias in clinical diagnosis may be reflected and amplified in the widely-used deep learning models.
arXiv Detail & Related papers (2023-01-26T18:53:09Z)
Improving Deep Facial Phenotyping for Ultra-rare Disorder Verification Using Model Ensembles [52.77024349608834]
We analyze the influence of replacing a DCNN with a state-of-the-art face recognition approach, iResNet with ArcFace. Our proposed ensemble model achieves state-of-the-art performance on both seen and unseen disorders.
arXiv Detail & Related papers (2022-11-12T23:28:54Z)
Federated Learning Enables Big Data for Rare Cancer Boundary Detection [98.5549882883963]
We present findings from the largest Federated ML study to-date, involving data from 71 healthcare institutions across 6 continents. We generate an automatic tumor boundary detector for the rare disease of glioblastoma. We demonstrate a 33% improvement over a publicly trained model to delineate the surgically targetable tumor, and 23% improvement over the tumor's entire extent.
arXiv Detail & Related papers (2022-04-22T17:27:00Z)
Mining Misdiagnosis Patterns from Biomedical Literature [8.534433954411409]
We find that the most commonly misdiagnosed diseases were often misdiagnosed as many different diseases. While a misdiagnosis relationship may generally exist, the relationship was often found to be one-sided.
arXiv Detail & Related papers (2020-06-24T13:34:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.