Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2
- URL: http://arxiv.org/abs/2505.01854v2
- Date: Mon, 03 Nov 2025 04:48:17 GMT
- Title: Accelerating Volumetric Medical Image Annotation via Short-Long Memory SAM 2
- Authors: Yuwen Chen, Zafer Yildiz, Qihang Li, Yaqian Chen, Haoyu Dong, Hanxue Gu, Nicholas Konz, Maciej A. Mazurowski,
- Abstract summary: Short-Long Memory SAM 2 (SLM-SAM 2) is a novel architecture that integrates distinct short-term and long-term memory banks to improve segmentation accuracy.<n>We evaluate SLM-SAM 2 on four public datasets covering organs, bones, and muscles across MRI, CT, and ultrasound videos.
- Score: 12.243345510831263
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Manual annotation of volumetric medical images, such as magnetic resonance imaging (MRI) and computed tomography (CT), is a labor-intensive and time-consuming process. Recent advancements in foundation models for video object segmentation, such as Segment Anything Model 2 (SAM 2), offer a potential opportunity to significantly speed up the annotation process by manually annotating one or a few slices and then propagating target masks across the entire volume. However, the performance of SAM 2 in this context varies. Our experiments show that relying on a single memory bank and attention module is prone to error propagation, particularly at boundary regions where the target is present in the previous slice but absent in the current one. To address this problem, we propose Short-Long Memory SAM 2 (SLM-SAM 2), a novel architecture that integrates distinct short-term and long-term memory banks with separate attention modules to improve segmentation accuracy. We evaluate SLM-SAM 2 on four public datasets covering organs, bones, and muscles across MRI, CT, and ultrasound videos. We show that the proposed method markedly outperforms the default SAM 2, achieving an average Dice Similarity Coefficient improvement of 0.14 and 0.10 in the scenarios when 5 volumes and 1 volume are available for the initial adaptation, respectively. SLM-SAM 2 also exhibits stronger resistance to over-propagation, reducing the time required to correct propagated masks by 60.575% per volume compared to SAM 2, making a notable step toward more accurate automated annotation of medical images for segmentation model development.
Related papers
- Evaluating SAM2 for Video Semantic Segmentation [60.157605818225186]
The Anything Model 2 (SAM2) has proven to be a powerful foundation model for promptable visual object segmentation in both images and videos.<n>This paper explores the extension of SAM2 to dense Video Semantic (VSS)<n>Our experiments suggest that leveraging SAM2 enhances overall performance in VSS, primarily due to its precise predictions of object boundaries.
arXiv Detail & Related papers (2025-12-01T15:15:16Z) - VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel [68.24765319399286]
We present VesSAM, a powerful and efficient framework tailored for 2D vessel segmentation.<n>VesSAM integrates (1) a convolutional adapter to enhance local texture features, (2) a multi-prompt encoder that fuses anatomical prompts, and (3) a lightweight mask decoder to reduce jagged artifacts.<n>VesSAM consistently outperforms state-of-the-art PEFT-based SAM variants by over 10% Dice and 13% IoU.
arXiv Detail & Related papers (2025-11-02T15:47:05Z) - BALR-SAM: Boundary-Aware Low-Rank Adaptation of SAM for Resource-Efficient Medical Image Segmentation [11.634558989215392]
Vision foundation models like the Segment Anything Model (SAM) often struggle in medical image segmentation due to a lack of domain-specific adaptation.<n>We propose BALR-SAM, a boundary-aware low-rank adaptation framework that enhances SAM for medical imaging.<n>It combines three tailored components: (1) a Complementary Detail Enhancement Network (CDEN) using depthwise separable convolutions and multi-scale fusion to capture boundary-sensitive features essential for accurate segmentation; (2) low-rank adapters integrated into SAM's Vision Transformer blocks to optimize feature representation and attention for medical contexts, while simultaneously significantly reducing the parameter space; and
arXiv Detail & Related papers (2025-09-29T02:36:09Z) - Depthwise-Dilated Convolutional Adapters for Medical Object Tracking and Segmentation Using the Segment Anything Model 2 [1.0596160761674702]
We propose DD-SAM2, an efficient adaptation framework for SAM2.<n> DD-SAM2 incorporates a Depthwise-Dilated Adapter (DD-Adapter) to enhance multi-scale feature extraction.<n> DD-SAM2 fully exploits SAM2's streaming memory for medical video object tracking and segmentation.
arXiv Detail & Related papers (2025-07-19T13:19:55Z) - SAMed-2: Selective Memory Enhanced Medical Segment Anything Model [28.534663662441293]
We propose a new foundation model for medical image segmentation built upon the SAM-2 architecture.<n>We introduce a temporal adapter into the image encoder to capture image correlations and a confidence-driven memory mechanism to store high-certainty features for later retrieval.<n>Our experiments on both internal benchmarks and 10 external datasets demonstrate superior performance over state-of-the-art baselines in multi-task scenarios.
arXiv Detail & Related papers (2025-07-04T16:30:38Z) - SAM2-SGP: Enhancing SAM2 for Medical Image Segmentation via Support-Set Guided Prompting [11.439997127887324]
reliance on human-provided prompts poses significant challenges in adapting SAM2 to medical image segmentation tasks.<n>We proposed SAM2 with support-set guided prompting (SAM2-SGP), a framework that eliminated the need for manual prompts.<n>The proposed model leveraged the memory mechanism of SAM2 to generate pseudo-masks using image-mask pairs from a support set via a Pseudo-mask Generation (PMG) module.
arXiv Detail & Related papers (2025-06-24T14:22:25Z) - RFMedSAM 2: Automatic Prompt Refinement for Enhanced Volumetric Medical Image Segmentation with SAM 2 [15.50695315680438]
Segment Anything Model 2 (SAM 2), a prompt-driven foundation model extending SAM to both image and video domains, has shown superior zero-shot performance compared to its predecessor.<n>However, similar to SAM, SAM 2 is limited by its output of binary masks, inability to infer semantic labels, and dependence on precise prompts for the target object area.<n>We explore the upper performance limit of SAM 2 using custom fine-tuning adapters, achieving a Dice Similarity Coefficient (DSC) of 92.30% on the BTCV dataset.
arXiv Detail & Related papers (2025-02-04T22:03:23Z) - Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation [47.789013598970925]
We propose a learnable prompting SAM-induced Knowledge distillation framework (KnowSAM) for semi-supervised medical image segmentation.<n>Our model outperforms the state-of-the-art semi-supervised segmentation approaches.
arXiv Detail & Related papers (2024-12-18T11:19:23Z) - SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [51.90445260276897]
We prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models.
We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation.
arXiv Detail & Related papers (2024-08-16T17:55:38Z) - Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2 [1.6237741047782823]
We introduce a method for zero-shot, single-prompt segmentation of 3D knee MRI by adapting Segment Anything Model 2.
By treating slices from 3D medical volumes as individual video frames, we leverage SAM2's advanced capabilities to generate motion- and spatially-aware predictions.
We demonstrate that SAM2 can efficiently perform segmentation tasks in a zero-shot manner with no additional training or fine-tuning.
arXiv Detail & Related papers (2024-08-08T21:39:15Z) - Is SAM 2 Better than SAM in Medical Image Segmentation? [0.6144680854063939]
The Segment Anything Model (SAM) has demonstrated impressive performance in zero-shot promptable segmentation on natural images.
The recently released Segment Anything Model 2 (SAM 2) claims to outperform SAM on images and extends the model's capabilities to video segmentation.
We conducted extensive studies using multiple datasets to compare the performance of SAM and SAM 2.
arXiv Detail & Related papers (2024-08-08T04:34:29Z) - Medical SAM 2: Segment medical images as video via Segment Anything Model 2 [17.469217682817586]
We introduce Medical SAM 2 (MedSAM-2), a generalized auto-tracking model for universal 2D and 3D medical image segmentation.<n>We evaluate MedSAM-2 on five 2D tasks and nine 3D tasks, including white blood cells, optic cups, retinal vessels, mandibles, coronary arteries, kidney tumors, liver tumors, breast cancer, nasopharynx cancer, vestibular schwan, mediastinal lymph nodules, cerebral artery, inferior alveolar nerve, and abdominal organs.
arXiv Detail & Related papers (2024-08-01T18:49:45Z) - BA-SAM: Scalable Bias-Mode Attention Mask for Segment Anything Model [65.92173280096588]
We address the challenge of image resolution variation for the Segment Anything Model (SAM)
SAM, known for its zero-shot generalizability, exhibits a performance degradation when faced with datasets with varying image sizes.
We present a bias-mode attention mask that allows each token to prioritize neighboring information.
arXiv Detail & Related papers (2024-01-04T15:34:44Z) - MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM.
Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments.
By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z) - 3DSAM-adapter: Holistic adaptation of SAM from 2D to 3D for promptable tumor segmentation [52.699139151447945]
We propose a novel adaptation method for transferring the segment anything model (SAM) from 2D to 3D for promptable medical image segmentation.
Our model can outperform domain state-of-the-art medical image segmentation models on 3 out of 4 tasks, specifically by 8.25%, 29.87%, and 10.11% for kidney tumor, pancreas tumor, colon cancer segmentation, and achieve similar performance for liver tumor segmentation.
arXiv Detail & Related papers (2023-06-23T12:09:52Z) - Medical SAM Adapter: Adapting Segment Anything Model for Medical Image
Segmentation [51.770805270588625]
The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation.
Recent studies and individual experiments have shown that SAM underperforms in medical image segmentation.
We propose the Medical SAM Adapter (Med-SA), which incorporates domain-specific medical knowledge into the segmentation model.
arXiv Detail & Related papers (2023-04-25T07:34:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.