Related papers: Uncertainty-Aware Vision-Language Segmentation for Medical Imaging

Uncertainty-Aware Vision-Language Segmentation for Medical Imaging

URL: http://arxiv.org/abs/2602.14498v2
Date: Fri, 20 Feb 2026 13:24:13 GMT
Title: Uncertainty-Aware Vision-Language Segmentation for Medical Imaging
Authors: Aryan Das, Tanishq Rachamalla, Koushik Biswas, Swalpa Kumar Roy, Vinay Kumar Verma,
Abstract summary: We introduce a novel uncertainty-aware multimodal segmentation framework for medical diagnosis.<n>We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion.<n>Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks.
Score: 12.545486211087791
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce a novel uncertainty-aware multimodal segmentation framework that leverages both radiological images and associated clinical text for precise medical diagnosis. We propose a Modality Decoding Attention Block (MoDAB) with a lightweight State Space Mixer (SSMix) to enable efficient cross-modal fusion and long-range dependency modelling. To guide learning under ambiguity, we propose the Spectral-Entropic Uncertainty (SEU) Loss, which jointly captures spatial overlap, spectral consistency, and predictive uncertainty in a unified objective. In complex clinical circumstances with poor image quality, this formulation improves model reliability. Extensive experiments on various publicly available medical datasets, QATA-COVID19, MosMed++, and Kvasir-SEG, demonstrate that our method achieves superior segmentation performance while being significantly more computationally efficient than existing State-of-the-Art (SoTA) approaches. Our results highlight the importance of incorporating uncertainty modelling and structured modality alignment in vision-language medical segmentation tasks. Code: https://github.com/arya-domain/UA-VLS

Related papers

MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization [46.65200216642429]
We introduce MedAD-38K, the first large-scale, multi-modal, and multi-center benchmark for MedAD featuring diagnostic Chain-of-Thought (CoT) annotations alongside structured Visual Question-Answering (VQA) pairs.<n>Our proposed model, MedAD-R1, achieves state-of-the-art (SOTA) performance on the MedAD-38K benchmark, outperforming strong baselines by more than 10%.
arXiv Detail & Related papers (2026-02-01T07:56:10Z)
Aligning Findings with Diagnosis: A Self-Consistent Reinforcement Learning Framework for Trustworthy Radiology Reporting [37.57009831483529]
Multimodal Large Language Models (MLLMs) have shown strong potential for radiology report generation.<n>Our framework restructures generation into two distinct components: a think block for detailed findings and an answer block for structured disease labels.
arXiv Detail & Related papers (2026-01-06T14:17:44Z)
DiA-gnostic VLVAE: Disentangled Alignment-Constrained Vision Language Variational AutoEncoder for Robust Radiology Reporting with Missing Modalities [3.5045368873011924]
We propose the DiA-gnostic VLVAE, which achieves robust radiology reporting through Disentangled Alignment.<n>Our framework is designed to be resilient to missing modalities by disentangling shared and modality-specific features.<n>A compact LLaMA-X decoder then uses these disentangled representations to generate reports efficiently.
arXiv Detail & Related papers (2025-11-08T11:08:27Z)
Med-K2N: Flexible K-to-N Modality Translation for Medical Image Synthesis [13.589690091116802]
Cross-modal medical image synthesis research focuses on reconstructing missing imaging modalities from available ones to support clinical diagnosis.<n>How can we model the heterogeneous contributions of different modalities to various target tasks?<n>How can we maintain modality identity consistency in multi-output generation?
arXiv Detail & Related papers (2025-10-03T08:47:17Z)
Robust Incomplete-Modality Alignment for Ophthalmic Disease Grading and Diagnosis via Labeled Optimal Transport [28.96009174108652]
Multimodal ophthalmic imaging-based diagnosis integrates color fundus image with optical coherence tomography ( OCT) to provide a comprehensive view of ocular pathologies.<n>Existing commonly used pipelines, such as modality imputation and distillation methods, face notable limitations.<n>We propose a novel multimodal alignment and fusion framework capable of robustly handling missing modalities in the task of ophthalmic diagnostics.
arXiv Detail & Related papers (2025-07-07T13:36:39Z)
HepatoGEN: Generating Hepatobiliary Phase MRI with Perceptual and Adversarial Models [33.7054351451505]
We propose a deep learning based approach for synthesizing hepatobiliary phase (HBP) images from earlier contrast phases.<n> Quantitative evaluation using pixel-wise and perceptual metrics, combined with blinded radiologist reviews, showed that pGAN achieved the best quantitative performance.<n>In contrast, the U-Net produced consistent liver enhancement with fewer artifacts, while DDPM underperformed due to limited preservation of fine structural details.
arXiv Detail & Related papers (2025-04-25T15:01:09Z)
KAN-Mamba FusionNet: Redefining Medical Image Segmentation with Non-Linear Modeling [3.2971993272923443]
We propose a novel architecture, the KAN-Mamba FusionNet, to improve medical image segmentation accuracy.<n>It consistently outperforms state-of-the-art methods in IoU and F1 scores.
arXiv Detail & Related papers (2024-11-18T09:19:16Z)
ETSCL: An Evidence Theory-Based Supervised Contrastive Learning Framework for Multi-modal Glaucoma Grading [7.188153974946432]
Glaucoma is one of the leading causes of vision impairment. It remains challenging to extract reliable features due to the high similarity of medical images and the unbalanced multi-modal data distribution. We propose a novel framework, namely ETSCL, which consists of a contrastive feature extraction stage and a decision-level fusion stage.
arXiv Detail & Related papers (2024-07-19T11:57:56Z)
Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis. We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z)
Improving Vision Anomaly Detection with the Guidance of Language Modality [64.53005837237754]
This paper tackles the challenges for vision modality from a multimodal point of view. We propose Cross-modal Guidance (CMG) to tackle the redundant information issue and sparse space issue. To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality.
arXiv Detail & Related papers (2023-10-04T13:44:56Z)
Cross-level Contrastive Learning and Consistency Constraint for Semi-supervised Medical Image Segmentation [46.678279106837294]
We propose a cross-level constrastive learning scheme to enhance representation capacity for local features in semi-supervised medical image segmentation. With the help of the cross-level contrastive learning and consistency constraint, the unlabelled data can be effectively explored to improve segmentation performance.
arXiv Detail & Related papers (2022-02-08T15:12:11Z)
SSMD: Semi-Supervised Medical Image Detection with Adaptive Consistency and Heterogeneous Perturbation [47.001609080453335]
We propose a novel Semi-Supervised Medical image Detector (SSMD) The motivation behind SSMD is to provide free yet effective supervision for unlabeled data, by regularizing the predictions at each position to be consistent. Extensive experimental results show that the proposed SSMD achieves the state-of-the-art performance at a wide range of settings.
arXiv Detail & Related papers (2021-06-03T01:59:50Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.