Related papers: MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing

MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing

URL: http://arxiv.org/abs/2506.01921v7
Date: Sat, 04 Oct 2025 11:55:05 GMT
Title: MedEBench: Diagnosing Reliability in Text-Guided Medical Image Editing
Authors: Minghao Liu, Zhitao He, Zhiyuan Fan, Qingyun Wang, Yi R. Fung,
Abstract summary: MedEBench is a benchmark designed to diagnose reliability in text-guided medical image editing.<n>MedEBench consists of 1,182 clinically curated image-prompt pairs covering 70 distinct editing tasks and 13 anatomical regions.
Score: 14.713122814049806
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-guided image editing has seen significant progress in natural image domains, but its application in medical imaging remains limited and lacks standardized evaluation frameworks. Such editing could revolutionize clinical practices by enabling personalized surgical planning, enhancing medical education, and improving patient communication. To bridge this gap, we introduce MedEBench1, a robust benchmark designed to diagnose reliability in text-guided medical image editing. MedEBench consists of 1,182 clinically curated image-prompt pairs covering 70 distinct editing tasks and 13 anatomical regions. It contributes in three key areas: (1) a clinically grounded evaluation framework that measures Editing Accuracy, Context Preservation, and Visual Quality, complemented by detailed descriptions of intended edits and corresponding Region-of-Interest (ROI) masks; (2) a comprehensive comparison of seven state-of-theart models, revealing consistent patterns of failure; and (3) a diagnostic error analysis technique that leverages attention alignment, using Intersection-over-Union (IoU) between model attention maps and ROI masks to identify mislocalization issues, where models erroneously focus on incorrect anatomical regions. MedEBench sets the stage for developing more reliable and clinically effective text-guided medical image editing tools.

Related papers

An Interpretable Local Editing Model for Counterfactual Medical Image Generation [11.263626235904995]
InstructX2X is a novel interpretable local editing model for counterfactual medical image generation featuring Region-Specific Editing.<n>Our model successfully generates high-quality counterfactual chest X-ray images along with interpretable explanations.
arXiv Detail & Related papers (2026-02-28T02:48:15Z)
MedREK: Retrieval-Based Editing for Medical LLMs with Key-Aware Prompts [70.64143198545031]
We propose MedREK, a retrieval-based editing framework that integrates a shared query-key module for precise matching with an attention-based prompt encoder for informative guidance.<n>Our results on various medical benchmarks demonstrate that our MedREK achieves superior performance across different core metrics.
arXiv Detail & Related papers (2025-10-15T12:50:33Z)
Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z)
ACM Multimedia Grand Challenge on ENT Endoscopy Analysis [9.343316855950263]
We introduce ENTRep, which integrates fine-grained anatomical classification with image-to-image and text-to-image retrieval under bilingual supervision.<n>The dataset comprises expert-annotated images, labeled for anatomical region and normal or abnormal status, and accompanied by dual-language narrative descriptions.
arXiv Detail & Related papers (2025-08-06T18:22:23Z)
Distribution-Based Masked Medical Vision-Language Model Using Structured Reports [9.306835492101413]
Medical image-text pre-training aims to align medical images with clinically relevant text to improve model performance on various downstream tasks.<n>This work introduces an uncertainty-aware medical image-text pre-training model that enhances generalization capabilities in medical image analysis.
arXiv Detail & Related papers (2025-07-29T13:31:24Z)
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding [50.483761005446]
Current models struggle to associate textual descriptions with disease regions due to inefficient attention mechanisms and a lack of fine-grained token representations.<n>We introduce Disease-Aware Prompting (DAP), which uses the explainability map of a VLM to identify the appropriate image features.<n>DAP improves visual grounding accuracy by 20.74% compared to state-of-the-art methods across three major chest X-ray datasets.
arXiv Detail & Related papers (2025-05-21T05:16:45Z)
GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing [60.66800567924348]
We introduce a new benchmark designed to evaluate text-guided image editing models.<n>The benchmark includes over 1000 high-quality editing examples across 20 diverse content categories.<n>We conduct a large-scale study comparing GPT-Image-1 against several state-of-the-art editing models.
arXiv Detail & Related papers (2025-05-16T17:55:54Z)
Reinforced Correlation Between Vision and Language for Precise Medical AI Assistant [11.187690318227514]
RCMed is a full-stack AI assistant that improves multimodal alignment in both input and output.<n>It achieves state-of-the-art precision in contextualizing irregular lesions and subtle anatomical boundaries.
arXiv Detail & Related papers (2025-05-06T10:00:08Z)
Interactive Tumor Progression Modeling via Sketch-Based Image Editing [54.47725383502915]
We propose SkEditTumor, a sketch-based diffusion model for controllable tumor progression editing.<n>By leveraging sketches as structural priors, our method enables precise modifications of tumor regions while maintaining structural integrity and visual realism.<n>Our contributions include a novel integration of sketches with diffusion models for medical image editing, fine-grained control over tumor progression visualization, and extensive validation across multiple datasets, setting a new benchmark in the field.
arXiv Detail & Related papers (2025-03-10T00:04:19Z)
Structure-Aware Stylized Image Synthesis for Robust Medical Image Segmentation [10.776242801237862]
We propose a novel medical image segmentation method that combines diffusion models and Structure-Preserving Network for structure-aware one-shot image stylization.<n>Our approach effectively mitigates domain shifts by transforming images from various sources into a consistent style while maintaining the location, size, and shape of lesions.
arXiv Detail & Related papers (2024-12-05T16:15:32Z)
LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model. We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z)
Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing [28.904419606450876]
We present a Vision-guided and Mask-enhanced Adaptive Editing (ViMAEdit) method with three key novel designs. First, we propose to leverage image embeddings as explicit guidance to enhance the conventional textual prompt-based denoising process. Second, we devise a self-attention-guided iterative editing area grounding strategy.
arXiv Detail & Related papers (2024-10-14T13:41:37Z)
MedEdit: Counterfactual Diffusion-based Image Editing on Brain MRI [2.4557713325522914]
We propose MedEdit, a conditional diffusion model for medical image editing. MedEdit induces pathology in specific areas while balancing the modeling of disease effects and preserving the integrity of the original scan. We believe this work will enable counterfactual image editing research to further advance the development of realistic and clinically useful imaging tools.
arXiv Detail & Related papers (2024-07-21T21:19:09Z)
Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
Plaintext-Free Deep Learning for Privacy-Preserving Medical Image Analysis via Frequency Information Embedding [9.192156293063414]
This paper proposes a novel framework that uses surrogate images for analysis. The framework is called Frequency-domain Exchange Style Fusion (FESF) Our framework effectively preserves the privacy of medical images and maintains diagnostic accuracy of DL models at a relatively high level, proving its effectiveness across various datasets and DL-based models.
arXiv Detail & Related papers (2024-03-25T06:56:38Z)
QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge [93.61262892578067]
Uncertainty in medical image segmentation tasks, especially inter-rater variability, presents a significant challenge. This variability directly impacts the development and evaluation of automated segmentation algorithms. We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ)
arXiv Detail & Related papers (2024-03-19T17:57:24Z)
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework [43.453943987647015]
Medical vision language pre-training has emerged as a frontier of research, enabling zero-shot pathological recognition. Due to the complex semantics of biomedical texts, current methods struggle to align medical images with key pathological findings in unstructured reports. This is achieved by consulting a large language model and medical experts. Ours improves the accuracy of recent methods by up to 8.56% and 17.26% for seen and unseen categories, respectively.
arXiv Detail & Related papers (2024-03-12T13:18:22Z)
ElixirNet: Relation-aware Network Architecture Adaptation for Medical Lesion Detection [90.13718478362337]
We introduce a novel ElixirNet that includes three components: 1) TruncatedRPN balances positive and negative data for false positive reduction; 2) Auto-lesion Block is automatically customized for medical images to incorporate relation-aware operations among region proposals; and 3) Relation transfer module incorporates the semantic relationship. Experiments on DeepLesion and Kits19 prove the effectiveness of ElixirNet, achieving improvement of both sensitivity and precision over FPN with fewer parameters.
arXiv Detail & Related papers (2020-03-03T05:29:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.