MedSAM3: Delving into Segment Anything with Medical Concepts
- URL: http://arxiv.org/abs/2511.19046v1
- Date: Mon, 24 Nov 2025 12:34:38 GMT
- Title: MedSAM3: Delving into Segment Anything with Medical Concepts
- Authors: Anglin Liu, Rundong Xue, Xu R. Cao, Yifan Shen, Yi Lu, Xiang Li, Qianqian Chen, Jintai Chen,
- Abstract summary: We propose MedSAM-3, a text promptable medical segmentation model for medical image and video segmentation.<n>By fine-tuning the Segment Anything Model (SAM) 3 architecture on medical images paired with semantic conceptual labels, our MedSAM-3 enables medical Promptable Concept (PCS)<n>We introduce the MedSAM-3 Agent, a framework that integrates complex reasoning and iterative refinement in an agent-in-the-loop workflow.
- Score: 14.669287067007419
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical image segmentation is fundamental for biomedical discovery. Existing methods lack generalizability and demand extensive, time-consuming manual annotation for new clinical application. Here, we propose MedSAM-3, a text promptable medical segmentation model for medical image and video segmentation. By fine-tuning the Segment Anything Model (SAM) 3 architecture on medical images paired with semantic conceptual labels, our MedSAM-3 enables medical Promptable Concept Segmentation (PCS), allowing precise targeting of anatomical structures via open-vocabulary text descriptions rather than solely geometric prompts. We further introduce the MedSAM-3 Agent, a framework that integrates Multimodal Large Language Models (MLLMs) to perform complex reasoning and iterative refinement in an agent-in-the-loop workflow. Comprehensive experiments across diverse medical imaging modalities, including X-ray, MRI, Ultrasound, CT, and video, demonstrate that our approach significantly outperforms existing specialist and foundation models. We will release our code and model at https://github.com/Joey-S-Liu/MedSAM3.
Related papers
- Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation [23.51581338204315]
We present Medical SAM3, a foundation model for universal prompt-driven medical image segmentation.<n>By fine-tuning SAM3's model parameters on 33 datasets spanning 10 medical imaging modalities, Medical SAM3 acquires robust domain-specific representations.<n>Our results establish Medical SAM3 as a universal, text-guided segmentation foundation model for medical imaging.
arXiv Detail & Related papers (2026-01-15T22:18:14Z) - MedGemma Technical Report [75.88152277443179]
We introduce MedGemma, a collection of medical vision-language foundation models based on Gemma 3 4B and 27B.<n>MedGemma demonstrates advanced medical understanding and reasoning on images and text.<n>We additionally introduce MedSigLIP, a medically-tuned vision encoder derived from SigLIP.
arXiv Detail & Related papers (2025-07-07T17:01:44Z) - MedSeg-R: Reasoning Segmentation in Medical Images with Multimodal Large Language Models [48.24824129683951]
We introduce medical image reasoning segmentation, a novel task that aims to generate segmentation masks based on complex and implicit medical instructions.<n>To address this, we propose MedSeg-R, an end-to-end framework that leverages the reasoning abilities of MLLMs to interpret clinical questions.<n>It is built on two core components: 1) a global context understanding module that interprets images and comprehends complex medical instructions to generate multi-modal intermediate tokens, and 2) a pixel-level grounding module that decodes these tokens to produce precise segmentation masks.
arXiv Detail & Related papers (2025-06-12T08:13:38Z) - Dynamically evolving segment anything model with continuous learning for medical image segmentation [50.92344083895528]
We introduce EvoSAM, a dynamically evolving medical image segmentation model.<n>EvoSAM continuously accumulates new knowledge from an ever-expanding array of scenarios and tasks.<n>Experiments conducted by surgical clinicians on blood vessel segmentation confirm that EvoSAM enhances segmentation efficiency based on user prompts.
arXiv Detail & Related papers (2025-03-08T14:37:52Z) - Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation [8.78725593323412]
Few-shot Adaptation of Training-frEe SAM (FATE-SAM) is a novel method designed to adapt the advanced Segment Anything Model 2 (SAM2) for 3D medical image segmentation.<n>FATE-SAM reassembles pre-trained modules of SAM2 to enable few-shot adaptation, leveraging a small number of support examples.<n>We evaluate FATE-SAM on multiple medical imaging datasets and compare it with supervised learning methods, zero-shot SAM approaches, and fine-tuned medical SAM methods.
arXiv Detail & Related papers (2025-01-15T20:44:21Z) - LIMIS: Towards Language-based Interactive Medical Image Segmentation [58.553786162527686]
LIMIS is the first purely language-based interactive medical image segmentation model.
We adapt Grounded SAM to the medical domain and design a language-based model interaction strategy.
We evaluate LIMIS on three publicly available medical datasets in terms of performance and usability.
arXiv Detail & Related papers (2024-10-22T12:13:47Z) - Medical SAM 2: Segment medical images as video via Segment Anything Model 2 [17.469217682817586]
We introduce Medical SAM 2 (MedSAM-2), a generalized auto-tracking model for universal 2D and 3D medical image segmentation.<n>We evaluate MedSAM-2 on five 2D tasks and nine 3D tasks, including white blood cells, optic cups, retinal vessels, mandibles, coronary arteries, kidney tumors, liver tumors, breast cancer, nasopharynx cancer, vestibular schwan, mediastinal lymph nodules, cerebral artery, inferior alveolar nerve, and abdominal organs.
arXiv Detail & Related papers (2024-08-01T18:49:45Z) - MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM.
Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments.
By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z) - MedLSAM: Localize and Segment Anything Model for 3D CT Images [13.320012515543116]
We introduce MedLAM, a 3D medical foundation localization model that accurately identifies any anatomical part within the body using only a few template scans.
We developed MedLSAM by integrating MedLAM with the Segment Anything Model (SAM)
Our findings are twofold: 1) MedLAM can directly localize anatomical structures using just a few template scans, achieving performance comparable to fully supervised models; 2) MedLSAM closely matches the performance of SAM and its specialized medical adaptations with manual prompts, while minimizing the need for extensive point annotations across the entire dataset.
arXiv Detail & Related papers (2023-06-26T15:09:02Z) - Towards Segment Anything Model (SAM) for Medical Image Segmentation: A
Survey [8.76496233192512]
We discuss efforts to extend the success of the Segment Anything Model to medical image segmentation tasks.
Many insights are drawn to guide future research to develop foundation models for medical image analysis.
arXiv Detail & Related papers (2023-05-05T16:48:45Z) - Medical SAM Adapter: Adapting Segment Anything Model for Medical Image
Segmentation [51.770805270588625]
The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation.
Recent studies and individual experiments have shown that SAM underperforms in medical image segmentation.
We propose the Medical SAM Adapter (Med-SA), which incorporates domain-specific medical knowledge into the segmentation model.
arXiv Detail & Related papers (2023-04-25T07:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.