SURE-Med: Systematic Uncertainty Reduction for Enhanced Reliability in Medical Report Generation
- URL: http://arxiv.org/abs/2508.01693v1
- Date: Sun, 03 Aug 2025 09:52:30 GMT
- Title: SURE-Med: Systematic Uncertainty Reduction for Enhanced Reliability in Medical Report Generation
- Authors: Yuhang Gu, Xingyu Hu, Yuyu Fan, Xulin Yan, Longhuan Xu, Peng peng,
- Abstract summary: We propose SURE-Med, a unified framework that systematically reduces uncertainty across three critical dimensions: visual, distributional, and contextual.<n>To mitigate visual uncertainty, a Frontal-Aware View Repair Resampling module corrects view annotation errors and adaptively selects informative features from supplementary views.<n>To tackle label distribution uncertainty, we introduce a Token Sensitive Learning objective that enhances the modeling of critical diagnostic sentences.<n>To reduce contextual uncertainty, our Contextual Evidence Filter validates and selectively incorporates prior information that aligns with the current image, effectively suppressing hallucinations.
- Score: 2.2185034594788164
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automated medical report generation (MRG) holds great promise for reducing the heavy workload of radiologists. However, its clinical deployment is hindered by three major sources of uncertainty. First, visual uncertainty, caused by noisy or incorrect view annotations, compromises feature extraction. Second, label distribution uncertainty, stemming from long-tailed disease prevalence, biases models against rare but clinically critical conditions. Third, contextual uncertainty, introduced by unverified historical reports, often leads to factual hallucinations. These challenges collectively limit the reliability and clinical trustworthiness of MRG systems. To address these issues, we propose SURE-Med, a unified framework that systematically reduces uncertainty across three critical dimensions: visual, distributional, and contextual. To mitigate visual uncertainty, a Frontal-Aware View Repair Resampling module corrects view annotation errors and adaptively selects informative features from supplementary views. To tackle label distribution uncertainty, we introduce a Token Sensitive Learning objective that enhances the modeling of critical diagnostic sentences while reweighting underrepresented diagnostic terms, thereby improving sensitivity to infrequent conditions. To reduce contextual uncertainty, our Contextual Evidence Filter validates and selectively incorporates prior information that aligns with the current image, effectively suppressing hallucinations. Extensive experiments on the MIMIC-CXR and IU-Xray benchmarks demonstrate that SURE-Med achieves state-of-the-art performance. By holistically reducing uncertainty across multiple input modalities, SURE-Med sets a new benchmark for reliability in medical report generation and offers a robust step toward trustworthy clinical decision support.
Related papers
- Toward Guarantees for Clinical Reasoning in Vision Language Models via Formal Verification [12.60121003165514]
Vision-language models (VLMs) show promise in drafting radiology reports, yet they frequently suffer from logical inconsistencies.<n>Standard lexical metrics heavily penalize clinical paraphrasing and fail to capture these deductive failures.<n>We introduce a neurosymbolic verification framework that deterministically audits the internal consistency of VLM-generated reports.
arXiv Detail & Related papers (2026-02-27T15:49:59Z) - Calibrated Bayesian Deep Learning for Explainable Decision Support Systems Based on Medical Imaging [6.826979426009301]
It is imperative that models quantify uncertainty in a manner that correlates with prediction correctness, allowing clinicians to identify unreliable outputs for further review.<n>The present paper proposes a generalizable probabilistic optimization framework grounded in Bayesian deep learning.<n>Specifically, a novel Confidence-Uncertainty Boundary Loss (CUB-Loss) is introduced that imposes penalties on high-certainty errors and low-certainty correct predictions.<n>The proposed framework is validated on three distinct medical imaging tasks: automatic screening of pneumonia, diabetic retinopathy detection, and identification of skin lesions.
arXiv Detail & Related papers (2026-02-12T14:03:41Z) - Towards Reliable Medical LLMs: Benchmarking and Enhancing Confidence Estimation of Large Language Models in Medical Consultation [97.36081721024728]
We propose the first benchmark for assessing confidence in multi-turn interaction during realistic medical consultations.<n>Our benchmark unifies three types of medical data for open-ended diagnostic generation.<n>We present MedConf, an evidence-grounded linguistic self-assessment framework.
arXiv Detail & Related papers (2026-01-22T04:51:39Z) - Benchmarking Egocentric Clinical Intent Understanding Capability for Medical Multimodal Large Language Models [48.95516224614331]
We introduce MedGaze-Bench, the first benchmark leveraging clinician gaze as a Cognitive Cursor to assess intent understanding across surgery, emergency simulation, and diagnostic interpretation.<n>Our benchmark addresses three fundamental challenges: visual homogeneity of anatomical structures, strict temporal-causal dependencies in clinical, and implicit adherence to safety protocols.
arXiv Detail & Related papers (2026-01-11T02:20:40Z) - Modeling Clinical Uncertainty in Radiology Reports: from Explicit Uncertainty Markers to Implicit Reasoning Pathways [16.76473492794096]
Explicit uncertainty reflects doubt about the presence or absence of findings, conveyed through hedging phrases.<n>Implicit uncertainty arises when radiologists omit parts of their reasoning, recording only key findings or diagnoses.<n>Here, it is often unclear whether omitted findings are truly absent or simply unmentioned for brevity.<n>We quantify explicit uncertainty by creating an expert-validated, LLM-based reference ranking of common hedging phrases, and mapping each finding to a probability value based on this reference.<n>In addition, we model implicit uncertainty through an expansion framework that systematically adds characteristic sub-findings derived from expert-defined diagnostic pathways for 14 common diagnoses.
arXiv Detail & Related papers (2025-11-06T16:24:53Z) - Uncertainty-Driven Expert Control: Enhancing the Reliability of Medical Vision-Language Models [52.2001050216955]
Existing methods aim to enhance the performance of Medical Vision Language Model (MedVLM) by adjusting model structure, fine-tuning with high-quality data, or through preference fine-tuning.<n>We propose an expert-in-the-loop framework named Expert-Controlled-Free Guidance (Expert-CFG) to align MedVLM with clinical expertise without additional training.
arXiv Detail & Related papers (2025-07-12T09:03:30Z) - Metrics that matter: Evaluating image quality metrics for medical image generation [48.85783422900129]
This study comprehensively assesses commonly used no-reference image quality metrics using brain MRI data.<n>We evaluate metric sensitivity to a range of challenges, including noise, distribution shifts, and, critically, morphological alterations designed to mimic clinically relevant inaccuracies.
arXiv Detail & Related papers (2025-05-12T01:57:25Z) - Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation [20.173287130474797]
generative medical Vision Large Language Models (VLLMs) are prone to hallucinations and can produce inaccurate diagnostic information.<n>We introduce a novel Semantic Consistency-Based Uncertainty Quantification framework that provides both report-level and sentence-level uncertainties.<n>Our approach improves factuality scores by $10$%, achieved by rejecting $20$% of reports using the textttRadialog model on the MIMIC-CXR dataset.
arXiv Detail & Related papers (2024-12-05T20:43:39Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - ICON: Improving Inter-Report Consistency in Radiology Report Generation via Lesion-aware Mixup Augmentation [14.479606737135045]
We propose ICON, which improves the inter-report consistency of radiology report generation.
Our approach first involves extracting lesions from input images and examining their characteristics.
Then, we introduce a lesion-aware mixup technique to ensure that the representations of the semantically equivalent lesions align with the same attributes.
arXiv Detail & Related papers (2024-02-20T09:13:15Z) - Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles [4.249986624493547]
Once deployed, medical image analysis methods are often faced with unexpected image corruptions and noise perturbations.<n>LaDiNE is a novel ensemble learning method combining the robustness of Vision Transformers with diffusion-based generative models.<n>Experiments on tuberculosis chest X-rays and melanoma skin cancer datasets demonstrate that LaDiNE achieves superior performance compared to a wide range of state-of-the-art methods.
arXiv Detail & Related papers (2023-10-24T15:53:07Z) - Improving Multiple Sclerosis Lesion Segmentation Across Clinical Sites:
A Federated Learning Approach with Noise-Resilient Training [75.40980802817349]
Deep learning models have shown promise for automatically segmenting MS lesions, but the scarcity of accurately annotated data hinders progress in this area.
We introduce a Decoupled Hard Label Correction (DHLC) strategy that considers the imbalanced distribution and fuzzy boundaries of MS lesions.
We also introduce a Centrally Enhanced Label Correction (CELC) strategy, which leverages the aggregated central model as a correction teacher for all sites.
arXiv Detail & Related papers (2023-08-31T00:36:10Z) - Towards Reliable Medical Image Segmentation by utilizing Evidential Calibrated Uncertainty [52.03490691733464]
We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.
By leveraging subjective logic theory, we explicitly model probability and uncertainty for the problem of medical image segmentation.
DeviS incorporates an uncertainty-aware filtering module, which utilizes the metric of uncertainty-calibrated error to filter reliable data.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - An Uncertainty-Informed Framework for Trustworthy Fault Diagnosis in
Safety-Critical Applications [1.988145627448243]
Low trustworthiness of deep learning-based prognostic and health management (PHM) hinders its applications in safety-critical assets.
We propose an uncertainty-informed framework to diagnose faults and meanwhile detect the OOD dataset.
We show that the proposed framework is of particular advantage in tackling unknowns and enhancing the trustworthiness of fault diagnosis in safety-critical applications.
arXiv Detail & Related papers (2021-10-08T21:24:14Z) - Confidence-Guided Radiology Report Generation [24.714303916431078]
We propose a novel method to quantify both the visual uncertainty and the textual uncertainty for the task of radiology report generation.
Our experimental results have demonstrated that our proposed method for model uncertainty characterization and estimation can provide more reliable confidence scores for radiology report generation.
arXiv Detail & Related papers (2021-06-21T07:02:12Z) - Bayesian Uncertainty Estimation of Learned Variational MRI
Reconstruction [63.202627467245584]
We introduce a Bayesian variational framework to quantify the model-immanent (epistemic) uncertainty.
We demonstrate that our approach yields competitive results for undersampled MRI reconstruction.
arXiv Detail & Related papers (2021-02-12T18:08:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.