Related papers: Medical Foundation Models are Susceptible to Targeted Misinformation Attacks

Medical Foundation Models are Susceptible to Targeted Misinformation Attacks

URL: http://arxiv.org/abs/2309.17007v1
Date: Fri, 29 Sep 2023 06:44:36 GMT
Title: Medical Foundation Models are Susceptible to Targeted Misinformation Attacks
Authors: Tianyu Han, Sven Nebelung, Firas Khader, Tianci Wang, Gustav Mueller-Franzes, Christiane Kuhl, Sebastian F\"orsch, Jens Kleesiek, Christoph Haarburger, Keno K. Bressem, Jakob Nikolas Kather, Daniel Truhn
Abstract summary: Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains. We demonstrate a concerning vulnerability of LLMs in medicine through targeted manipulation of just 1.1% of the model's weights. We validate our findings in a set of 1,038 incorrect biomedical facts.
Score: 3.252906830953028
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large language models (LLMs) have broad medical knowledge and can reason about medical information across many domains, holding promising potential for diverse medical applications in the near future. In this study, we demonstrate a concerning vulnerability of LLMs in medicine. Through targeted manipulation of just 1.1% of the model's weights, we can deliberately inject an incorrect biomedical fact. The erroneous information is then propagated in the model's output, whilst its performance on other biomedical tasks remains intact. We validate our findings in a set of 1,038 incorrect biomedical facts. This peculiar susceptibility raises serious security and trustworthiness concerns for the application of LLMs in healthcare settings. It accentuates the need for robust protective measures, thorough verification mechanisms, and stringent management of access to these models, ensuring their reliable and safe use in medical practice.

Related papers

Medical Hallucinations in Foundation Models and Their Impact on Healthcare [53.97060824532454]
Foundation Models that are capable of processing and generating multi-modal data have transformed AI's role in medicine. We define medical hallucination as any instance in which a model generates misleading medical content. Our results reveal that inference techniques such as Chain-of-Thought (CoT) and Search Augmented Generation can effectively reduce hallucination rates. These findings underscore the ethical and practical imperative for robust detection and mitigation strategies.
arXiv Detail & Related papers (2025-02-26T02:30:44Z)
Fact or Guesswork? Evaluating Large Language Model's Medical Knowledge with Structured One-Hop Judgment [108.55277188617035]
Large language models (LLMs) have been widely adopted in various downstream task domains, but their ability to directly recall and apply factual medical knowledge remains under-explored. Most existing medical QA benchmarks assess complex reasoning or multi-hop inference, making it difficult to isolate LLMs' inherent medical knowledge from their reasoning capabilities. We introduce the Medical Knowledge Judgment, a dataset specifically designed to measure LLMs' one-hop factual medical knowledge.
arXiv Detail & Related papers (2025-02-20T05:27:51Z)
Medical Multimodal Model Stealing Attacks via Adversarial Domain Alignment [79.41098832007819]
Medical multimodal large language models (MLLMs) are becoming an instrumental part of healthcare systems. As medical data is scarce and protected by privacy regulations, medical MLLMs represent valuable intellectual property. We introduce Adversarial Domain Alignment (ADA-STEAL), the first stealing attack against medical MLLMs.
arXiv Detail & Related papers (2025-02-04T16:04:48Z)
Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track [18.3893773380282]
hallucinations or confabulations remain one of the key challenges when using large language models (LLMs) in the biomedical domain. Inaccuracies may be particularly harmful in high-risk situations, such as medical question answering, making clinical decisions, or appraising biomedical research.
arXiv Detail & Related papers (2024-11-27T05:43:00Z)
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models [20.781551849965357]
We introduce MediConfusion, a challenging medical Visual Question Answering (VQA) benchmark dataset. We reveal that state-of-the-art models are easily confused by image pairs that are otherwise visually dissimilar and clearly distinct for medical experts. We also extract common patterns of model failure that may help the design of a new generation of more trustworthy and reliable MLLMs in healthcare.
arXiv Detail & Related papers (2024-09-23T18:59:37Z)
FEDMEKI: A Benchmark for Scaling Medical Foundation Models via Federated Knowledge Injection [83.54960238236548]
FEDMEKI not only preserves data privacy but also enhances the capability of medical foundation models. FEDMEKI allows medical foundation models to learn from a broader spectrum of medical knowledge without direct data exposure.
arXiv Detail & Related papers (2024-08-17T15:18:56Z)
Adversarial Attacks on Large Language Models in Medicine [34.17895005922139]
The integration of Large Language Models into healthcare applications offers promising advancements in medical diagnostics, treatment recommendations, and patient care. The susceptibility of LLMs to adversarial attacks poses a significant threat, potentially leading to harmful outcomes in delicate medical contexts. This study investigates the vulnerability of LLMs to two types of adversarial attacks in three medical tasks.
arXiv Detail & Related papers (2024-06-18T04:24:30Z)
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models [92.04812189642418]
We introduce CARES and aim to evaluate the Trustworthiness of Med-LVLMs across the medical domain. We assess the trustworthiness of Med-LVLMs across five dimensions, including trustfulness, fairness, safety, privacy, and robustness.
arXiv Detail & Related papers (2024-06-10T04:07:09Z)
BioMedLM: A 2.7B Parameter Language Model Trained On Biomedical Text [82.7001841679981]
BioMedLM is a 2.7 billion parameter GPT-style autoregressive model trained exclusively on PubMed abstracts and full articles. When fine-tuned, BioMedLM can produce strong multiple-choice biomedical question-answering results competitive with larger models. BioMedLM can also be fine-tuned to produce useful answers to patient questions on medical topics.
arXiv Detail & Related papers (2024-03-27T10:18:21Z)
Multimodal Large Language Models to Support Real-World Fact-Checking [80.41047725487645]
Multimodal large language models (MLLMs) carry the potential to support humans in processing vast amounts of information. While MLLMs are already being used as a fact-checking tool, their abilities and limitations in this regard are understudied. We propose a framework for systematically assessing the capacity of current multimodal models to facilitate real-world fact-checking.
arXiv Detail & Related papers (2024-03-06T11:32:41Z)
Large Language Model Distilling Medication Recommendation Model [61.89754499292561]
We harness the powerful semantic comprehension and input-agnostic characteristics of Large Language Models (LLMs) Our research aims to transform existing medication recommendation methodologies using LLMs. To mitigate this, we have developed a feature-level knowledge distillation technique, which transfers the LLM's proficiency to a more compact model.
arXiv Detail & Related papers (2024-02-05T08:25:22Z)
ChatFDA: Medical Records Risk Assessment [0.0]
This study explores a pioneering application aimed at addressing this challenge by assisting caregivers in gauging potential risks derived from medical notes. The application leverages data from openFDA, delivering real-time, actionable insights regarding prescriptions. Preliminary analyses conducted on the MIMIC-III citemimic dataset affirm a proof of concept highlighting a reduction in medical errors and an amplification in patient safety.
arXiv Detail & Related papers (2023-12-20T03:40:45Z)
Self-Diagnosis and Large Language Models: A New Front for Medical Misinformation [8.738092015092207]
We evaluate the capabilities of large language models (LLMs) from the lens of a general user self-diagnosing. We develop a testing methodology which can be used to evaluate responses to open-ended questions mimicking real-world use cases. We reveal that a) these models perform worse than previously known, and b) they exhibit peculiar behaviours, including overconfidence when stating incorrect recommendations.
arXiv Detail & Related papers (2023-07-10T21:28:26Z)
MedAlpaca -- An Open-Source Collection of Medical Conversational AI Models and Training Data [40.97474177100237]
Large language models (LLMs) hold considerable promise for improving medical, diagnostics, patient care, and education. Yet, there is an urgent need for open-source models that can be deployed on-premises to safeguard patient privacy. We present an innovative dataset consisting of over 160,000 entries, specifically crafted to fine-tune LLMs for effective medical applications.
arXiv Detail & Related papers (2023-04-14T11:28:08Z)
Privacy-preserving medical image analysis [53.4844489668116]
We present PriMIA, a software framework designed for privacy-preserving machine learning (PPML) in medical imaging. We show significantly better classification performance of a securely aggregated federated learning model compared to human experts on unseen datasets. We empirically evaluate the framework's security against a gradient-based model inversion attack.
arXiv Detail & Related papers (2020-12-10T13:56:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.