Language models are susceptible to incorrect patient self-diagnosis in
medical applications
- URL: http://arxiv.org/abs/2309.09362v1
- Date: Sun, 17 Sep 2023 19:56:39 GMT
- Title: Language models are susceptible to incorrect patient self-diagnosis in
medical applications
- Authors: Rojin Ziaei and Samuel Schmidgall
- Abstract summary: We present a variety of LLMs with multiple-choice questions from U.S. medical board exams modified to include self-diagnostic reports from patients.
Our findings highlight that when a patient proposes incorrect bias-validating information, the diagnostic accuracy of LLMs drop dramatically.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are becoming increasingly relevant as a
potential tool for healthcare, aiding communication between clinicians,
researchers, and patients. However, traditional evaluations of LLMs on medical
exam questions do not reflect the complexity of real patient-doctor
interactions. An example of this complexity is the introduction of patient
self-diagnosis, where a patient attempts to diagnose their own medical
conditions from various sources. While the patient sometimes arrives at an
accurate conclusion, they more often are led toward misdiagnosis due to the
patient's over-emphasis on bias validating information. In this work we present
a variety of LLMs with multiple-choice questions from United States medical
board exams which are modified to include self-diagnostic reports from
patients. Our findings highlight that when a patient proposes incorrect
bias-validating information, the diagnostic accuracy of LLMs drop dramatically,
revealing a high susceptibility to errors in self-diagnosis.
Related papers
- DiversityMedQA: Assessing Demographic Biases in Medical Diagnosis using Large Language Models [2.750784330885499]
We introduce DiversityMedQA, a novel benchmark designed to assess large language models (LLMs) responses to medical queries across diverse patient demographics.
Our findings reveal notable discrepancies in model performance when tested against these demographic variations.
arXiv Detail & Related papers (2024-09-02T23:37:20Z) - RuleAlign: Making Large Language Models Better Physicians with Diagnostic Rule Alignment [54.91736546490813]
We introduce the RuleAlign framework, designed to align Large Language Models with specific diagnostic rules.
We develop a medical dialogue dataset comprising rule-based communications between patients and physicians.
Experimental results demonstrate the effectiveness of the proposed approach.
arXiv Detail & Related papers (2024-08-22T17:44:40Z) - Assessing and Enhancing Large Language Models in Rare Disease Question-answering [64.32570472692187]
We introduce a rare disease question-answering (ReDis-QA) dataset to evaluate the performance of Large Language Models (LLMs) in diagnosing rare diseases.
We collected 1360 high-quality question-answer pairs within the ReDis-QA dataset, covering 205 rare diseases.
We then benchmarked several open-source LLMs, revealing that diagnosing rare diseases remains a significant challenge for these models.
Experiment results demonstrate that ReCOP can effectively improve the accuracy of LLMs on the ReDis-QA dataset by an average of 8%.
arXiv Detail & Related papers (2024-08-15T21:09:09Z) - Digital Diagnostics: The Potential Of Large Language Models In Recognizing Symptoms Of Common Illnesses [0.2995925627097048]
This study evaluates each model diagnostic abilities by interpreting a user symptoms and determining diagnoses that fit well with common illnesses.
GPT-4 demonstrates higher diagnostic accuracy from its deep and complete history of training on medical data.
Gemini performs with high precision as a critical tool in disease triage, demonstrating its potential to be a reliable model.
arXiv Detail & Related papers (2024-05-09T15:12:24Z) - Conversational Disease Diagnosis via External Planner-Controlled Large Language Models [18.93345199841588]
This study presents a LLM-based diagnostic system that enhances planning capabilities by emulating doctors.
By utilizing real patient electronic medical record data, we constructed simulated dialogues between virtual patients and doctors.
arXiv Detail & Related papers (2024-04-04T06:16:35Z) - Towards Reducing Diagnostic Errors with Interpretable Risk Prediction [18.474645862061426]
We propose a method to use LLMs to identify pieces of evidence in patient EHR data that indicate increased or decreased risk of specific diagnoses.
Our ultimate aim is to increase access to evidence and reduce diagnostic errors.
arXiv Detail & Related papers (2024-02-15T17:05:48Z) - AI Hospital: Benchmarking Large Language Models in a Multi-agent Medical Interaction Simulator [69.51568871044454]
We introduce textbfAI Hospital, a framework simulating dynamic medical interactions between emphDoctor as player and NPCs.
This setup allows for realistic assessments of LLMs in clinical scenarios.
We develop the Multi-View Medical Evaluation benchmark, utilizing high-quality Chinese medical records and NPCs.
arXiv Detail & Related papers (2024-02-15T06:46:48Z) - Self-Diagnosis and Large Language Models: A New Front for Medical
Misinformation [8.738092015092207]
We evaluate the capabilities of large language models (LLMs) from the lens of a general user self-diagnosing.
We develop a testing methodology which can be used to evaluate responses to open-ended questions mimicking real-world use cases.
We reveal that a) these models perform worse than previously known, and b) they exhibit peculiar behaviours, including overconfidence when stating incorrect recommendations.
arXiv Detail & Related papers (2023-07-10T21:28:26Z) - SPeC: A Soft Prompt-Based Calibration on Performance Variability of
Large Language Model in Clinical Notes Summarization [50.01382938451978]
We introduce a model-agnostic pipeline that employs soft prompts to diminish variance while preserving the advantages of prompt-based summarization.
Experimental findings indicate that our method not only bolsters performance but also effectively curbs variance for various language models.
arXiv Detail & Related papers (2023-03-23T04:47:46Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Towards Causality-Aware Inferring: A Sequential Discriminative Approach
for Medical Diagnosis [142.90770786804507]
Medical diagnosis assistant (MDA) aims to build an interactive diagnostic agent to sequentially inquire about symptoms for discriminating diseases.
This work attempts to address these critical issues in MDA by taking advantage of the causal diagram.
We propose a propensity-based patient simulator to effectively answer unrecorded inquiry by drawing knowledge from the other records.
arXiv Detail & Related papers (2020-03-14T02:05:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.