MedEBench: Revisiting Text-instructed Image Editing on Medical Domain
- URL: http://arxiv.org/abs/2506.01921v3
- Date: Wed, 04 Jun 2025 10:55:29 GMT
- Title: MedEBench: Revisiting Text-instructed Image Editing on Medical Domain
- Authors: Minghao Liu, Zhitao He, Zhiyuan Fan, Qingyun Wang, Yi R. Fung,
- Abstract summary: MedEBench is a benchmark for evaluating text-guided medical image editing.<n>It consists of 1,182 clinically sourced image-prompt triplets spanning 70 tasks across 13 anatomical regions.
- Score: 3.6550055178925835
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-guided image editing has seen rapid progress in natural image domains, but its adaptation to medical imaging remains limited and lacks standardized evaluation. Clinically, such editing holds promise for simulating surgical outcomes, creating personalized teaching materials, and enhancing patient communication. To bridge this gap, we introduce MedEBench, a comprehensive benchmark for evaluating text-guided medical image editing. It consists of 1,182 clinically sourced image-prompt triplets spanning 70 tasks across 13 anatomical regions. MedEBench offers three key contributions: (1) a clinically relevant evaluation framework covering Editing Accuracy, Contextual Preservation, and Visual Quality, supported by detailed descriptions of expected change and ROI (Region of Interest) masks; (2) a systematic comparison of seven state-of-the-art models, revealing common failure patterns; and (3) a failure analysis protocol based on attention grounding, using IoU between attention maps and ROIs to identify mislocalization. MedEBench provides a solid foundation for developing and evaluating reliable, clinically meaningful medical image editing systems. Project website: https://mliuby.github.io/MedEBench_Website/
Related papers
- ACM Multimedia Grand Challenge on ENT Endoscopy Analysis [9.343316855950263]
We introduce ENTRep, which integrates fine-grained anatomical classification with image-to-image and text-to-image retrieval under bilingual supervision.<n>The dataset comprises expert-annotated images, labeled for anatomical region and normal or abnormal status, and accompanied by dual-language narrative descriptions.
arXiv Detail & Related papers (2025-08-06T18:22:23Z) - Distribution-Based Masked Medical Vision-Language Model Using Structured Reports [9.306835492101413]
Medical image-text pre-training aims to align medical images with clinically relevant text to improve model performance on various downstream tasks.<n>This work introduces an uncertainty-aware medical image-text pre-training model that enhances generalization capabilities in medical image analysis.
arXiv Detail & Related papers (2025-07-29T13:31:24Z) - Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding [50.483761005446]
Current models struggle to associate textual descriptions with disease regions due to inefficient attention mechanisms and a lack of fine-grained token representations.<n>We introduce Disease-Aware Prompting (DAP), which uses the explainability map of a VLM to identify the appropriate image features.<n>DAP improves visual grounding accuracy by 20.74% compared to state-of-the-art methods across three major chest X-ray datasets.
arXiv Detail & Related papers (2025-05-21T05:16:45Z) - GIE-Bench: Towards Grounded Evaluation for Text-Guided Image Editing [60.66800567924348]
We introduce a new benchmark designed to evaluate text-guided image editing models.<n>The benchmark includes over 1000 high-quality editing examples across 20 diverse content categories.<n>We conduct a large-scale study comparing GPT-Image-1 against several state-of-the-art editing models.
arXiv Detail & Related papers (2025-05-16T17:55:54Z) - Interactive Tumor Progression Modeling via Sketch-Based Image Editing [54.47725383502915]
We propose SkEditTumor, a sketch-based diffusion model for controllable tumor progression editing.<n>By leveraging sketches as structural priors, our method enables precise modifications of tumor regions while maintaining structural integrity and visual realism.<n>Our contributions include a novel integration of sketches with diffusion models for medical image editing, fine-grained control over tumor progression visualization, and extensive validation across multiple datasets, setting a new benchmark in the field.
arXiv Detail & Related papers (2025-03-10T00:04:19Z) - Structure-Aware Stylized Image Synthesis for Robust Medical Image Segmentation [10.776242801237862]
We propose a novel medical image segmentation method that combines diffusion models and Structure-Preserving Network for structure-aware one-shot image stylization.<n>Our approach effectively mitigates domain shifts by transforming images from various sources into a consistent style while maintaining the location, size, and shape of lesions.
arXiv Detail & Related papers (2024-12-05T16:15:32Z) - LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model.
We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy.
We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z) - Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing [28.904419606450876]
We present a Vision-guided and Mask-enhanced Adaptive Editing (ViMAEdit) method with three key novel designs.
First, we propose to leverage image embeddings as explicit guidance to enhance the conventional textual prompt-based denoising process.
Second, we devise a self-attention-guided iterative editing area grounding strategy.
arXiv Detail & Related papers (2024-10-14T13:41:37Z) - MedEdit: Counterfactual Diffusion-based Image Editing on Brain MRI [2.4557713325522914]
We propose MedEdit, a conditional diffusion model for medical image editing.
MedEdit induces pathology in specific areas while balancing the modeling of disease effects and preserving the integrity of the original scan.
We believe this work will enable counterfactual image editing research to further advance the development of realistic and clinically useful imaging tools.
arXiv Detail & Related papers (2024-07-21T21:19:09Z) - Uncertainty-aware Medical Diagnostic Phrase Identification and Grounding [72.18719355481052]
We introduce a novel task called Medical Report Grounding (MRG)<n>MRG aims to directly identify diagnostic phrases and their corresponding grounding boxes from medical reports in an end-to-end manner.<n>We propose uMedGround, a robust and reliable framework that leverages a multimodal large language model to predict diagnostic phrases.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Plaintext-Free Deep Learning for Privacy-Preserving Medical Image Analysis via Frequency Information Embedding [9.192156293063414]
This paper proposes a novel framework that uses surrogate images for analysis.
The framework is called Frequency-domain Exchange Style Fusion (FESF)
Our framework effectively preserves the privacy of medical images and maintains diagnostic accuracy of DL models at a relatively high level, proving its effectiveness across various datasets and DL-based models.
arXiv Detail & Related papers (2024-03-25T06:56:38Z) - QUBIQ: Uncertainty Quantification for Biomedical Image Segmentation Challenge [93.61262892578067]
Uncertainty in medical image segmentation tasks, especially inter-rater variability, presents a significant challenge.
This variability directly impacts the development and evaluation of automated segmentation algorithms.
We report the set-up and summarize the benchmark results of the Quantification of Uncertainties in Biomedical Image Quantification Challenge (QUBIQ)
arXiv Detail & Related papers (2024-03-19T17:57:24Z) - Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework [43.453943987647015]
Medical vision language pre-training has emerged as a frontier of research, enabling zero-shot pathological recognition.
Due to the complex semantics of biomedical texts, current methods struggle to align medical images with key pathological findings in unstructured reports.
This is achieved by consulting a large language model and medical experts.
Ours improves the accuracy of recent methods by up to 8.56% and 17.26% for seen and unseen categories, respectively.
arXiv Detail & Related papers (2024-03-12T13:18:22Z) - ElixirNet: Relation-aware Network Architecture Adaptation for Medical
Lesion Detection [90.13718478362337]
We introduce a novel ElixirNet that includes three components: 1) TruncatedRPN balances positive and negative data for false positive reduction; 2) Auto-lesion Block is automatically customized for medical images to incorporate relation-aware operations among region proposals; and 3) Relation transfer module incorporates the semantic relationship.
Experiments on DeepLesion and Kits19 prove the effectiveness of ElixirNet, achieving improvement of both sensitivity and precision over FPN with fewer parameters.
arXiv Detail & Related papers (2020-03-03T05:29:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.