Related papers: Prototype-Enhanced Confidence Modeling for Cross-Modal Medical Image-Report Retrieval

Prototype-Enhanced Confidence Modeling for Cross-Modal Medical Image-Report Retrieval

URL: http://arxiv.org/abs/2508.03494v1
Date: Tue, 05 Aug 2025 14:26:41 GMT
Title: Prototype-Enhanced Confidence Modeling for Cross-Modal Medical Image-Report Retrieval
Authors: Shreyank N Gowda, Xiaobo Jin, Christian Wagner,
Abstract summary: Cross-modal retrieval tasks, such as image-to-report and report-to-image retrieval, are essential but challenging due to the inherent ambiguity and variability in medical data.<n>Existing models often struggle to capture the nuanced, multi-level semantic relationships in radiology data, leading to unreliable retrieval results.<n>We propose the Prototype-Enhanced Confidence Modeling framework, which introduces multi-level prototypes for each modality to better capture semantic variability and enhance retrieval robustness.
Score: 9.238186292926573
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In cross-modal retrieval tasks, such as image-to-report and report-to-image retrieval, accurately aligning medical images with relevant text reports is essential but challenging due to the inherent ambiguity and variability in medical data. Existing models often struggle to capture the nuanced, multi-level semantic relationships in radiology data, leading to unreliable retrieval results. To address these issues, we propose the Prototype-Enhanced Confidence Modeling (PECM) framework, which introduces multi-level prototypes for each modality to better capture semantic variability and enhance retrieval robustness. PECM employs a dual-stream confidence estimation that leverages prototype similarity distributions and an adaptive weighting mechanism to control the impact of high-uncertainty data on retrieval rankings. Applied to radiology image-report datasets, our method achieves significant improvements in retrieval precision and consistency, effectively handling data ambiguity and advancing reliability in complex clinical scenarios. We report results on multiple different datasets and tasks including fully supervised and zero-shot retrieval obtaining performance gains of up to 10.17%, establishing in new state-of-the-art.

Related papers

Metrics that matter: Evaluating image quality metrics for medical image generation [48.85783422900129]
This study comprehensively assesses commonly used no-reference image quality metrics using brain MRI data.<n>We evaluate metric sensitivity to a range of challenges, including noise, distribution shifts, and, critically, morphological alterations designed to mimic clinically relevant inaccuracies.
arXiv Detail & Related papers (2025-05-12T01:57:25Z)
Benchmarking Robustness of Contrastive Learning Models for Medical Image-Report Retrieval [2.9801426627439453]
This study benchmarks the robustness of four state-of-the-art contrastive learning models: CLIP, CXR-RePaiR, MedCLIP, and CXR-CLIP.<n>Our findings reveal that all evaluated models are highly sensitive to out-of-distribution data.<n>By addressing these limitations, we can develop more reliable cross-domain retrieval models for medical applications.
arXiv Detail & Related papers (2025-01-15T20:37:04Z)
Distributional Drift Detection in Medical Imaging with Sketching and Fine-Tuned Transformer [2.7552551107566137]
This paper presents an accurate and sensitive approach to detect distributional drift in CT-scan medical images.<n>We developed a robust baseline library model for real-time anomaly detection, allowing for efficient comparison of incoming images.<n>We fine-tuned a pre-trained Vision Transformer model to extract relevant features, using mammography as a case study.
arXiv Detail & Related papers (2024-08-15T23:46:37Z)
MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions [0.13108652488669734]
integration of neural-network-based systems into clinical practice is limited by challenges related to domain generalization and robustness. We create and open-source MedMNIST-C, a benchmark dataset based on the MedMNIST+ collection covering 12 datasets and 9 imaging modalities.
arXiv Detail & Related papers (2024-06-25T13:20:39Z)
Confidence-aware multi-modality learning for eye disease screening [58.861421804458395]
We propose a novel multi-modality evidential fusion pipeline for eye disease screening. It provides a measure of confidence for each modality and elegantly integrates the multi-modality information. Experimental results on both public and internal datasets demonstrate that our model excels in robustness.
arXiv Detail & Related papers (2024-05-28T13:27:30Z)
Adaptive Affinity-Based Generalization For MRI Imaging Segmentation Across Resource-Limited Settings [1.5703963908242198]
This paper introduces a novel relation-based knowledge framework by seamlessly combining adaptive affinity-based and kernel-based distillation. To validate our innovative approach, we conducted experiments on publicly available multi-source prostate MRI data.
arXiv Detail & Related papers (2024-04-03T13:35:51Z)
Benchmarking Zero-Shot Robustness of Multimodal Foundation Models: A Pilot Study [61.65123150513683]
multimodal foundation models, such as CLIP, produce state-of-the-art zero-shot results. It is reported that these models close the robustness gap by matching the performance of supervised models trained on ImageNet. We show that CLIP leads to a significant robustness drop compared to supervised ImageNet models on our benchmark.
arXiv Detail & Related papers (2024-03-15T17:33:49Z)
Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles [4.249986624493547]
Once deployed, medical image analysis methods are often faced with unexpected image corruptions and noise perturbations.<n>LaDiNE is a novel ensemble learning method combining the robustness of Vision Transformers with diffusion-based generative models.<n>Experiments on tuberculosis chest X-rays and melanoma skin cancer datasets demonstrate that LaDiNE achieves superior performance compared to a wide range of state-of-the-art methods.
arXiv Detail & Related papers (2023-10-24T15:53:07Z)
C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network [67.97926983664676]
We propose a cross-modal consistent multi-view medical report generation with a domain transfer network (C2M-DoT) C2M-DoT substantially outperforms state-of-the-art baselines in all metrics.
arXiv Detail & Related papers (2023-10-09T02:31:36Z)
Trustworthy Multimodal Regression with Mixture of Normal-inverse Gamma Distributions [91.63716984911278]
We introduce a novel Mixture of Normal-Inverse Gamma distributions (MoNIG) algorithm, which efficiently estimates uncertainty in principle for adaptive integration of different modalities and produces a trustworthy regression result. Experimental results on both synthetic and different real-world data demonstrate the effectiveness and trustworthiness of our method on various multimodal regression tasks.
arXiv Detail & Related papers (2021-11-11T14:28:12Z)
Confidence-Guided Radiology Report Generation [24.714303916431078]
We propose a novel method to quantify both the visual uncertainty and the textual uncertainty for the task of radiology report generation. Our experimental results have demonstrated that our proposed method for model uncertainty characterization and estimation can provide more reliable confidence scores for radiology report generation.
arXiv Detail & Related papers (2021-06-21T07:02:12Z)
Semi-supervised Medical Image Classification with Relation-driven Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification. It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations. Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.