Self-Diagnosis and Large Language Models: A New Front for Medical
Misinformation
- URL: http://arxiv.org/abs/2307.04910v1
- Date: Mon, 10 Jul 2023 21:28:26 GMT
- Title: Self-Diagnosis and Large Language Models: A New Front for Medical
Misinformation
- Authors: Francois Barnard, Marlize Van Sittert, Sirisha Rambhatla
- Abstract summary: We evaluate the capabilities of large language models (LLMs) from the lens of a general user self-diagnosing.
We develop a testing methodology which can be used to evaluate responses to open-ended questions mimicking real-world use cases.
We reveal that a) these models perform worse than previously known, and b) they exhibit peculiar behaviours, including overconfidence when stating incorrect recommendations.
- Score: 8.738092015092207
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving healthcare quality and access remains a critical concern for
countries worldwide. Consequently, the rise of large language models (LLMs) has
erupted a wealth of discussion around healthcare applications among researchers
and consumers alike. While the ability of these models to pass medical exams
has been used to argue in favour of their use in medical training and
diagnosis, the impact of their inevitable use as a self-diagnostic tool and
their role in spreading healthcare misinformation has not been evaluated. In
this work, we critically evaluate LLMs' capabilities from the lens of a general
user self-diagnosing, as well as the means through which LLMs may aid in the
spread of medical misinformation. To accomplish this, we develop a testing
methodology which can be used to evaluate responses to open-ended questions
mimicking real-world use cases. In doing so, we reveal that a) these models
perform worse than previously known, and b) they exhibit peculiar behaviours,
including overconfidence when stating incorrect recommendations, which
increases the risk of spreading medical misinformation.
Related papers
- Comprehensive and Practical Evaluation of Retrieval-Augmented Generation Systems for Medical Question Answering [70.44269982045415]
Retrieval-augmented generation (RAG) has emerged as a promising approach to enhance the performance of large language models (LLMs)
We introduce Medical Retrieval-Augmented Generation Benchmark (MedRGB) that provides various supplementary elements to four medical QA datasets.
Our experimental results reveals current models' limited ability to handle noise and misinformation in the retrieved documents.
arXiv Detail & Related papers (2024-11-14T06:19:18Z) - MisinfoEval: Generative AI in the Era of "Alternative Facts" [50.069577397751175]
We introduce a framework for generating and evaluating large language model (LLM) based misinformation interventions.
We present (1) an experiment with a simulated social media environment to measure effectiveness of misinformation interventions, and (2) a second experiment with personalized explanations tailored to the demographics and beliefs of users.
Our findings confirm that LLM-based interventions are highly effective at correcting user behavior.
arXiv Detail & Related papers (2024-10-13T18:16:50Z) - The Role of Language Models in Modern Healthcare: A Comprehensive Review [2.048226951354646]
The application of large language models (LLMs) in healthcare has gained significant attention.
This review examines the trajectory of language models from their early stages to the current state-of-the-art LLMs.
arXiv Detail & Related papers (2024-09-25T12:15:15Z) - MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models [20.781551849965357]
We introduce MediConfusion, a challenging medical Visual Question Answering (VQA) benchmark dataset.
We reveal that state-of-the-art models are easily confused by image pairs that are otherwise visually dissimilar and clearly distinct for medical experts.
We also extract common patterns of model failure that may help the design of a new generation of more trustworthy and reliable MLLMs in healthcare.
arXiv Detail & Related papers (2024-09-23T18:59:37Z) - STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical Question-Answering [58.79671189792399]
STLLaVA-Med is designed to train a policy model capable of auto-generating medical visual instruction data.
We validate the efficacy and data efficiency of STLLaVA-Med across three major medical Visual Question Answering (VQA) benchmarks.
arXiv Detail & Related papers (2024-06-28T15:01:23Z) - Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs)
Our research aims to transform existing medication recommendation methodologies using LLMs.
To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z) - MedLM: Exploring Language Models for Medical Question Answering Systems [2.84801080855027]
Large Language Models (LLMs) with their advanced generative capabilities have shown promise in various NLP tasks.
This study aims to compare the performance of general and medical-specific distilled LMs for medical Q&A.
The findings will provide valuable insights into the suitability of different LMs for specific applications in the medical domain.
arXiv Detail & Related papers (2024-01-21T03:37:47Z) - Medical Foundation Models are Susceptible to Targeted Misinformation
Attacks [3.252906830953028]
Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains.
We demonstrate a concerning vulnerability of LLMs in medicine through targeted manipulation of just 1.1% of the model's weights.
We validate our findings in a set of 1,038 incorrect biomedical facts.
arXiv Detail & Related papers (2023-09-29T06:44:36Z) - Language models are susceptible to incorrect patient self-diagnosis in
medical applications [0.0]
We present a variety of LLMs with multiple-choice questions from U.S. medical board exams modified to include self-diagnostic reports from patients.
Our findings highlight that when a patient proposes incorrect bias-validating information, the diagnostic accuracy of LLMs drop dramatically.
arXiv Detail & Related papers (2023-09-17T19:56:39Z) - Informing clinical assessment by contextualizing post-hoc explanations
of risk prediction models in type-2 diabetes [50.8044927215346]
We consider a comorbidity risk prediction scenario and focus on contexts regarding the patients clinical state.
We employ several state-of-the-art LLMs to present contexts around risk prediction model inferences and evaluate their acceptability.
Our paper is one of the first end-to-end analyses identifying the feasibility and benefits of contextual explanations in a real-world clinical use case.
arXiv Detail & Related papers (2023-02-11T18:07:11Z) - Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging.
We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets.
We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.