The Case Records of ChatGPT: Language Models and Complex Clinical
Questions
- URL: http://arxiv.org/abs/2305.05609v1
- Date: Tue, 9 May 2023 16:58:32 GMT
- Title: The Case Records of ChatGPT: Language Models and Complex Clinical
Questions
- Authors: Timothy Poterucha, Pierre Elias, Christopher M. Haggerty
- Abstract summary: The accuracy of large language AI models GPT4 and GPT3.5 in diagnosing complex clinical cases was investigated.
GPT4 and GPT3.5 accurately provided the correct diagnosis in 26% and 22% of cases in one attempt, and 46% and 42% within three attempts, respectively.
- Score: 0.35157846138914034
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Background: Artificial intelligence language models have shown promise in
various applications, including assisting with clinical decision-making as
demonstrated by strong performance of large language models on medical
licensure exams. However, their ability to solve complex, open-ended cases,
which may be representative of clinical practice, remains unexplored. Methods:
In this study, the accuracy of large language AI models GPT4 and GPT3.5 in
diagnosing complex clinical cases was investigated using published Case Records
of the Massachusetts General Hospital. A total of 50 cases requiring a
diagnosis and diagnostic test published from January 1, 2022 to April 16, 2022
were identified. For each case, models were given a prompt requesting the top
three specific diagnoses and associated diagnostic tests, followed by case
text, labs, and figure legends. Model outputs were assessed in comparison to
the final clinical diagnosis and whether the model-predicted test would result
in a correct diagnosis. Results: GPT4 and GPT3.5 accurately provided the
correct diagnosis in 26% and 22% of cases in one attempt, and 46% and 42%
within three attempts, respectively. GPT4 and GPT3.5 provided a correct
essential diagnostic test in 28% and 24% of cases in one attempt, and 44% and
50% within three attempts, respectively. No significant differences were found
between the two models, and multiple trials with identical prompts using the
GPT3.5 model provided similar results. Conclusions: In summary, these models
demonstrate potential usefulness in generating differential diagnoses but
remain limited in their ability to provide a single unifying diagnosis in
complex, open-ended cases. Future research should focus on evaluating model
performance in larger datasets of open-ended clinical challenges and exploring
potential human-AI collaboration strategies to enhance clinical
decision-making.
Related papers
- Towards Accountable AI-Assisted Eye Disease Diagnosis: Workflow Design, External Validation, and Continual Learning [5.940140611616894]
AI shows promise in diagnosis accuracy but faces real-world application issues due to insufficient validation in clinical and diverse populations.
This study addresses gaps in medical AI downstream accountability through a case study on age-related macular degeneration (AMD) diagnosis and classification severity.
arXiv Detail & Related papers (2024-09-23T15:01:09Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Methodology and Real-World Applications of Dynamic Uncertain Causality Graph for Clinical Diagnosis with Explainability and Invariance [41.373856519548404]
Dynamic Uncertain Causality Graph (DUCG) approach is causality-driven, explainable, and invariant across different application scenarios.
46 DUCG models covering 54 chief complaints were constructed.
Over one million real diagnosis cases have been performed, with only 17 incorrect diagnoses identified.
arXiv Detail & Related papers (2024-06-09T11:37:45Z) - Enhancing Diagnostic Accuracy through Multi-Agent Conversations: Using Large Language Models to Mitigate Cognitive Bias [5.421033429862095]
Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes.
This study explores the role of large language models in mitigating these biases through the utilization of a multi-agent framework.
arXiv Detail & Related papers (2024-01-26T01:35:50Z) - Large-scale Long-tailed Disease Diagnosis on Radiology Images [51.453990034460304]
RadDiag is a foundational model supporting 2D and 3D inputs across various modalities and anatomies.
Our dataset, RP3D-DiagDS, contains 40,936 cases with 195,010 scans covering 5,568 disorders.
arXiv Detail & Related papers (2023-12-26T18:20:48Z) - Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for
Multimodal Medical Diagnosis [59.35504779947686]
GPT-4V is OpenAI's newest model for multimodal medical diagnosis.
Our evaluation encompasses 17 human body systems.
GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy.
It faces significant challenges in disease diagnosis and generating comprehensive reports.
arXiv Detail & Related papers (2023-10-15T18:32:27Z) - The Potential and Pitfalls of using a Large Language Model such as
ChatGPT or GPT-4 as a Clinical Assistant [12.017491902296836]
ChatGPT and GPT-4 have demonstrated promising performance on several medical domain tasks.
We performed two analyses using ChatGPT and GPT-4, one to identify patients with specific medical diagnoses using a real-world large electronic health record database.
For patient assessment, GPT-4 can accurately diagnose three out of four times.
arXiv Detail & Related papers (2023-07-16T21:19:47Z) - A Transformer-based representation-learning model with unified
processing of multimodal input for clinical diagnostics [63.106382317917344]
We report a Transformer-based representation-learning model as a clinical diagnostic aid that processes multimodal input in a unified manner.
The unified model outperformed an image-only model and non-unified multimodal diagnosis models in the identification of pulmonary diseases.
arXiv Detail & Related papers (2023-06-01T16:23:47Z) - Capabilities of GPT-4 on Medical Challenge Problems [23.399857819743158]
GPT-4 is a general-purpose model that is not specialized for medical problems through training or to solve clinical tasks.
We present a comprehensive evaluation of GPT-4 on medical competency examinations and benchmark datasets.
arXiv Detail & Related papers (2023-03-20T16:18:38Z) - Advancing COVID-19 Diagnosis with Privacy-Preserving Collaboration in
Artificial Intelligence [79.038671794961]
We launch the Unified CT-COVID AI Diagnostic Initiative (UCADI), where the AI model can be distributedly trained and independently executed at each host institution.
Our study is based on 9,573 chest computed tomography scans (CTs) from 3,336 patients collected from 23 hospitals located in China and the UK.
arXiv Detail & Related papers (2021-11-18T00:43:41Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.