Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across Modalities
- URL: http://arxiv.org/abs/2508.07031v1
- Date: Sat, 09 Aug 2025 16:03:46 GMT
- Title: Trustworthy Medical Imaging with Large Language Models: A Study of Hallucinations Across Modalities
- Authors: Anindya Bijoy Das, Shahnewaz Karim Sakib, Shibbir Ahmed,
- Abstract summary: Large Language Models (LLMs) are increasingly applied to medical imaging tasks.<n>These models often produce hallucinations, which are confident but incorrect outputs that can mislead clinical decisions.<n>This study examines hallucinations in two directions: image to text, where LLMs generate reports from X-ray, CT, or MRI scans, and text to image, where models create medical images from clinical prompts.
- Score: 3.1406146587437904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large Language Models (LLMs) are increasingly applied to medical imaging tasks, including image interpretation and synthetic image generation. However, these models often produce hallucinations, which are confident but incorrect outputs that can mislead clinical decisions. This study examines hallucinations in two directions: image to text, where LLMs generate reports from X-ray, CT, or MRI scans, and text to image, where models create medical images from clinical prompts. We analyze errors such as factual inconsistencies and anatomical inaccuracies, evaluating outputs using expert informed criteria across imaging modalities. Our findings reveal common patterns of hallucination in both interpretive and generative tasks, with implications for clinical reliability. We also discuss factors contributing to these failures, including model architecture and training data. By systematically studying both image understanding and generation, this work provides insights into improving the safety and trustworthiness of LLM driven medical imaging systems.
Related papers
- On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI [4.866086225040713]
We introduce a perturbation-based approach to quantify a model's reliance on each modality in binary classification tasks.<n>By swapping images or text between samples with opposing labels, we expose modality-specific biases.
arXiv Detail & Related papers (2025-07-31T21:35:52Z) - Distribution-Based Masked Medical Vision-Language Model Using Structured Reports [9.306835492101413]
Medical image-text pre-training aims to align medical images with clinically relevant text to improve model performance on various downstream tasks.<n>This work introduces an uncertainty-aware medical image-text pre-training model that enhances generalization capabilities in medical image analysis.
arXiv Detail & Related papers (2025-07-29T13:31:24Z) - Causal Disentanglement for Robust Long-tail Medical Image Generation [80.15257897500578]
We propose a novel medical image generation framework, which generates independent pathological and structural features.<n>We leverage a diffusion model guided by pathological findings to model pathological features, enabling the generation of diverse counterfactual images.
arXiv Detail & Related papers (2025-04-20T01:54:18Z) - Language-Guided Trajectory Traversal in Disentangled Stable Diffusion Latent Space for Factorized Medical Image Generation [0.8397730500554048]
We present the first investigation of the power of pre-trained vision-language foundation models, once fine-tuned on medical image datasets, to perform latent disentanglement.<n>We demonstrate that language-guided Stable Diffusion inherently learns to factorize key attributes for image generation.<n>We devise a framework to identify, isolate, and manipulate key attributes through latent space trajectory of generative models, facilitating precise control over medical image synthesis.
arXiv Detail & Related papers (2025-03-30T23:15:52Z) - RadIR: A Scalable Framework for Multi-Grained Medical Image Retrieval via Radiology Report Mining [64.66825253356869]
We propose a novel methodology that leverages dense radiology reports to define image-wise similarity ordering at multiple granularities.<n>We construct two comprehensive medical imaging retrieval datasets: MIMIC-IR for Chest X-rays and CTRATE-IR for CT scans.<n>We develop two retrieval systems, RadIR-CXR and model-ChestCT, which demonstrate superior performance in traditional image-image and image-report retrieval tasks.
arXiv Detail & Related papers (2025-03-06T17:43:03Z) - Bidirectional Brain Image Translation using Transfer Learning from Generic Pre-trained Models [0.0]
In the medical domain, where obtaining labeled medical images is labor-intensive and expensive, addressing data scarcity is a major challenge.<n>Recent studies propose using transfer learning to overcome this issue.<n>In this work, transfer learning was applied to the task of MR-CT image translation and vice versa using 18 pre-trained non-medical models.
arXiv Detail & Related papers (2025-01-21T20:30:15Z) - Towards a Systematic Evaluation of Hallucinations in Large-Vision Language Models [57.58426038241812]
Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in complex multimodal tasks.<n>These models still suffer from hallucinations when required to implicitly recognize or infer diverse visual entities from images.<n>We propose a novel visual question answering (VQA) benchmark that employs contextual reasoning prompts as hallucination attacks.
arXiv Detail & Related papers (2024-12-29T23:56:01Z) - Autoregressive Sequence Modeling for 3D Medical Image Representation [48.706230961589924]
We introduce a pioneering method for learning 3D medical image representations through an autoregressive sequence pre-training framework.<n>Our approach various 3D medical images based on spatial, contrast, and semantic correlations, treating them as interconnected visual tokens within a token sequence.
arXiv Detail & Related papers (2024-09-13T10:19:10Z) - VALD-MD: Visual Attribution via Latent Diffusion for Medical Diagnostics [0.0]
Visual attribution in medical imaging seeks to make evident the diagnostically-relevant components of a medical image.
We here present a novel generative visual attribution technique, one that leverages latent diffusion models in combination with domain-specific large language models.
The resulting system also exhibits a range of latent capabilities including zero-shot localized disease induction.
arXiv Detail & Related papers (2024-01-02T19:51:49Z) - Robust and Interpretable Medical Image Classifiers via Concept
Bottleneck Models [49.95603725998561]
We propose a new paradigm to build robust and interpretable medical image classifiers with natural language concepts.
Specifically, we first query clinical concepts from GPT-4, then transform latent image features into explicit concepts with a vision-language model.
arXiv Detail & Related papers (2023-10-04T21:57:09Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [72.8965643836841]
We introduce XrayGPT, a novel conversational medical vision-language model.<n>It can analyze and answer open-ended questions about chest radiographs.<n>We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.