Related papers: MediSee: Reasoning-based Pixel-level Perception in Medical Images

MediSee: Reasoning-based Pixel-level Perception in Medical Images

URL: http://arxiv.org/abs/2504.11008v2
Date: Wed, 23 Apr 2025 15:29:29 GMT
Title: MediSee: Reasoning-based Pixel-level Perception in Medical Images
Authors: Qinyue Tong, Ziqian Lu, Jun Liu, Yangming Zheng, Zheming Lu,
Abstract summary: We introduce a novel medical vision task: Medical Reasoning and Detection (MedSD)<n>MedSD aims to comprehend implicit queries about medical images and generate the corresponding segmentation mask and bounding box for the target object.<n>We propose MediSee, an effective baseline model designed for medical reasoning segmentation and detection.
Score: 6.405810587061276
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Despite remarkable advancements in pixel-level medical image perception, existing methods are either limited to specific tasks or heavily rely on accurate bounding boxes or text labels as input prompts. However, the medical knowledge required for input is a huge obstacle for general public, which greatly reduces the universality of these methods. Compared with these domain-specialized auxiliary information, general users tend to rely on oral queries that require logical reasoning. In this paper, we introduce a novel medical vision task: Medical Reasoning Segmentation and Detection (MedSD), which aims to comprehend implicit queries about medical images and generate the corresponding segmentation mask and bounding box for the target object. To accomplish this task, we first introduce a Multi-perspective, Logic-driven Medical Reasoning Segmentation and Detection (MLMR-SD) dataset, which encompasses a substantial collection of medical entity targets along with their corresponding reasoning. Furthermore, we propose MediSee, an effective baseline model designed for medical reasoning segmentation and detection. The experimental results indicate that the proposed method can effectively address MedSD with implicit colloquial queries and outperform traditional medical referring segmentation methods.

Related papers

No Masks Needed: Explainable AI for Deriving Segmentation from Classification [6.647179199462945]
We introduce a novel approach that fine-tunes pre-trained models specifically for medical images.<n>Our method integrates Explainable AI to generate relevance scores, enhancing the segmentation process.<n>Unlike traditional methods that excel in standard benchmarks but falter in medical applications, our approach achieves improved results on datasets like CBIS-DDSM, NuInsSeg and Kvasir-SEG.
arXiv Detail & Related papers (2025-08-06T15:18:00Z)
MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models [48.24824129683951]
We introduce medical image reasoning segmentation, a novel task that aims to generate segmentation masks based on complex and implicit medical instructions.<n>To address this, we propose MedSeg-R, an end-to-end framework that leverages the reasoning abilities of MLLMs to interpret clinical questions.<n>It is built on two core components: 1) a global context understanding module that interprets images and comprehends complex medical instructions to generate multi-modal intermediate tokens, and 2) a pixel-level grounding module that decodes these tokens to produce precise segmentation masks.
arXiv Detail & Related papers (2025-06-12T08:13:38Z)
MAMBO-NET: Multi-Causal Aware Modeling Backdoor-Intervention Optimization for Medical Image Segmentation Network [51.68708264694361]
Confusion factors can affect medical images, such as complex anatomical variations and imaging modality limitations.<n>We propose a multi-causal aware modeling backdoor-intervention optimization network for medical image segmentation.<n>Our method significantly reduces the influence of confusion factors, leading to enhanced segmentation accuracy.
arXiv Detail & Related papers (2025-05-28T01:40:10Z)
PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging [6.411386758550256]
PRS-Med is a framework that integrates vision-language models with segmentation capabilities to generate both accurate segmentation masks and corresponding spatial reasoning outputs.<n> MMRS dataset provides diverse, spatially-grounded question-answer pairs to address the lack of position reasoning data in medical imaging.
arXiv Detail & Related papers (2025-05-17T06:42:28Z)
MedCoT: Medical Chain of Thought via Hierarchical Expert [48.91966620985221]
This paper presents MedCoT, a novel hierarchical expert verification reasoning chain method.<n>It is designed to enhance interpretability and accuracy in biomedical imaging inquiries.<n> Experimental evaluations on four standard Med-VQA datasets demonstrate that MedCoT surpasses existing state-of-the-art approaches.
arXiv Detail & Related papers (2024-12-18T11:14:02Z)
LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model. We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy. We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z)
Few Exemplar-Based General Medical Image Segmentation via Domain-Aware Selective Adaptation [28.186785488818135]
Medical image segmentation poses challenges due to domain gaps, data modality variations, and dependency on domain knowledge or experts. We introduce a domain-aware selective adaptation approach to adapt the general knowledge learned from a large model trained with natural images to the corresponding medical domains/modalities.
arXiv Detail & Related papers (2024-10-11T21:00:57Z)
MedRG: Medical Report Grounding with Multi-modal Large Language Model [42.04042642085121]
Medical Report Grounding (MedRG) is an end-to-end solution for utilizing a multi-modal Large Language Model to predict key phrase. The experimental results validate the effectiveness of MedRG, surpassing the performance of the existing state-of-the-art medical phrase grounding methods.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
Influence based explainability of brain tumors segmentation in multimodal Magnetic Resonance Imaging [3.1994667952195273]
We focus on the segmentation of medical images task, where most explainability methods proposed so far provide a visual explanation in terms of an input saliency map. The aim of this work is to extend, implement and test instead an influence-based explainability algorithm, TracIn, proposed originally for classification tasks.
arXiv Detail & Related papers (2024-04-05T17:07:21Z)
CLIP in Medical Imaging: A Survey [59.429714742927956]
Contrastive Language-Image Pre-training successfully introduces text supervision to vision models.<n>The use of CLIP has recently gained increasing interest in the medical imaging domain.
arXiv Detail & Related papers (2023-12-12T15:21:57Z)
EviPrompt: A Training-Free Evidential Prompt Generation Method for Segment Anything Model in Medical Images [14.899388051854084]
Medical image segmentation has immense clinical applicability but remains a challenge despite advancements in deep learning. This paper introduces a novel training-free evidential prompt generation method named EviPrompt to overcome these issues. The proposed method, built on the inherent similarities within medical images, requires only a single reference image-annotation pair.
arXiv Detail & Related papers (2023-11-10T21:22:22Z)
Multi-task Explainable Skin Lesion Classification [54.76511683427566]
We propose a few-shot-based approach for skin lesions that generalizes well with few labelled data. The proposed approach comprises a fusion of a segmentation network that acts as an attention module and classification network.
arXiv Detail & Related papers (2023-10-11T05:49:47Z)
BMAD: Benchmarks for Medical Anomaly Detection [51.22159321912891]
Anomaly detection (AD) is a fundamental research problem in machine learning and computer vision. In medical imaging, AD is especially vital for detecting and diagnosing anomalies that may indicate rare diseases or conditions. We introduce a comprehensive evaluation benchmark for assessing anomaly detection methods on medical images.
arXiv Detail & Related papers (2023-06-20T20:23:46Z)
Few Shot Medical Image Segmentation with Cross Attention Transformer [30.54965157877615]
We propose a novel framework for few-shot medical image segmentation, termed CAT-Net. Our proposed network mines the correlations between the support image and query image, limiting them to focus only on useful foreground information. We validated the proposed method on three public datasets: Abd-CT, Abd-MRI, and Card-MRI.
arXiv Detail & Related papers (2023-03-24T09:10:14Z)
Self-Supervision with Superpixels: Training Few-shot Medical Image Segmentation without Annotation [12.47837000630753]
Few-shot semantic segmentation has great potential for medical imaging applications. Most of the existing FSS techniques require abundant annotated semantic classes for training. We propose a novel self-supervised FSS framework for medical images in order to eliminate the requirement for annotations during training.
arXiv Detail & Related papers (2020-07-20T04:46:33Z)
Collaborative Unsupervised Domain Adaptation for Medical Image Diagnosis [102.40869566439514]
We seek to exploit rich labeled data from relevant domains to help the learning in the target task via Unsupervised Domain Adaptation (UDA) Unlike most UDA methods that rely on clean labeled data or assume samples are equally transferable, we innovatively propose a Collaborative Unsupervised Domain Adaptation algorithm. We theoretically analyze the generalization performance of the proposed method, and also empirically evaluate it on both medical and general images.
arXiv Detail & Related papers (2020-07-05T11:49:17Z)
Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.