SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection
- URL: http://arxiv.org/abs/2503.03942v1
- Date: Wed, 05 Mar 2025 22:18:32 GMT
- Title: SurgiSAM2: Fine-tuning a foundational model for surgical video anatomy segmentation and detection
- Authors: Devanish N. Kamtam, Joseph B. Shrager, Satya Deepya Malla, Xiaohan Wang, Nicole Lin, Juan J. Cardona, Serena Yeung-Levy, Clarence Hu,
- Abstract summary: We evaluate SAM 2 for surgical scene understanding by examining its semantic segmentation capabilities for organs/tissues.<n>S SurgiSAM 2, a fine-tuned SAM 2 model, demonstrated significant improvements in segmentation performance.
- Score: 14.469704692948435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Background: We evaluate SAM 2 for surgical scene understanding by examining its semantic segmentation capabilities for organs/tissues both in zero-shot scenarios and after fine-tuning. Methods: We utilized five public datasets to evaluate and fine-tune SAM 2 for segmenting anatomical tissues in surgical videos/images. Fine-tuning was applied to the image encoder and mask decoder. We limited training subsets from 50 to 400 samples per class to better model real-world constraints with data acquisition. The impact of dataset size on fine-tuning performance was evaluated with weighted mean Dice coefficient (WMDC), and the results were also compared against previously reported state-of-the-art (SOTA) results. Results: SurgiSAM 2, a fine-tuned SAM 2 model, demonstrated significant improvements in segmentation performance, achieving a 17.9% relative WMDC gain compared to the baseline SAM 2. Increasing prompt points from 1 to 10 and training data scale from 50/class to 400/class enhanced performance; the best WMDC of 0.92 on the validation subset was achieved with 10 prompt points and 400 samples per class. On the test subset, this model outperformed prior SOTA methods in 24/30 (80%) of the classes with a WMDC of 0.91 using 10-point prompts. Notably, SurgiSAM 2 generalized effectively to unseen organ classes, achieving SOTA on 7/9 (77.8%) of them. Conclusion: SAM 2 achieves remarkable zero-shot and fine-tuned performance for surgical scene segmentation, surpassing prior SOTA models across several organ classes of diverse datasets. This suggests immense potential for enabling automated/semi-automated annotation pipelines, thereby decreasing the burden of annotations facilitating several surgical applications.
Related papers
- Performance Analysis of Deep Learning Models for Femur Segmentation in MRI Scan [5.5193366921929155]
We evaluate and compare the performance of three CNN-based models, i.e., U-Net, Attention U-Net, and U-KAN, and one transformer-based model, SAM 2.
The dataset comprises 11,164 MRI scans with detailed annotations of femoral regions.
Attention U-Net achieves the highest overall scores, while U-KAN demonstrated superior performance in anatomical regions with a smaller region of interest.
arXiv Detail & Related papers (2025-04-05T05:47:56Z) - Improving the U-Net Configuration for Automated Delineation of Head and Neck Cancer on MRI [0.0]
Tumor volume segmentation on MRI is a challenging and time-consuming process.<n>This work presents an approach to automated delineation of head and neck tumors on MRI scans.<n>The focus of this research was to propose improvements to the configuration commonly used in medical segmentation tasks.
arXiv Detail & Related papers (2025-01-09T10:22:35Z) - Handling Geometric Domain Shifts in Semantic Segmentation of Surgical RGB and Hyperspectral Images [67.66644395272075]
We present first analysis of state-of-the-art semantic segmentation models when faced with geometric out-of-distribution data.
We propose an augmentation technique called "Organ Transplantation" to enhance generalizability.
Our augmentation technique improves SOA model performance by up to 67 % for RGB data and 90 % for HSI data, achieving performance at the level of in-distribution performance on real OOD test data.
arXiv Detail & Related papers (2024-08-27T19:13:15Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - TotalSegmentator MRI: Robust Sequence-independent Segmentation of Multiple Anatomic Structures in MRI [59.86827659781022]
A nnU-Net model (TotalSegmentator) was trained on MRI and segment 80atomic structures.<n>Dice scores were calculated between the predicted segmentations and expert reference standard segmentations to evaluate model performance.<n>Open-source, easy-to-use model allows for automatic, robust segmentation of 80 structures.
arXiv Detail & Related papers (2024-05-29T20:15:54Z) - Low-resource classification of mobility functioning information in
clinical sentences using large language models [0.0]
This study evaluates the ability of publicly available large language models (LLMs) to accurately identify the presence of functioning information from clinical notes.
We collect a balanced binary classification dataset of 1000 sentences from the Mobility NER dataset, which was curated from n2c2 clinical notes.
arXiv Detail & Related papers (2023-12-15T20:59:17Z) - nnSAM: Plug-and-play Segment Anything Model Improves nnUNet Performance [12.169801149021566]
The Segment Anything Model (SAM) has emerged as a versatile tool for image segmentation without specific domain training.
Traditional models like nnUNet perform automatic segmentation during inference but need extensive domain-specific training.
We propose nnSAM, integrating SAM's robust feature extraction with nnUNet's automatic configuration to enhance segmentation accuracy on small datasets.
arXiv Detail & Related papers (2023-09-29T04:26:25Z) - MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM.
Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments.
By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z) - SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation.
Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes.
In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z) - MAPPING: Model Average with Post-processing for Stroke Lesion
Segmentation [57.336056469276585]
We present our stroke lesion segmentation model based on nnU-Net framework, and apply it to the Anatomical Tracings of Lesions After Stroke dataset.
Our method took the first place in the 2022 MICCAI ATLAS Challenge with an average Dice score of 0.6667, Lesion-wise F1 score of 0.5643, Simple Lesion Count score of 4.5367, and Volume Difference score of 8804.9102.
arXiv Detail & Related papers (2022-11-11T14:17:04Z) - Treatment classification of posterior capsular opacification (PCO) using
automated ground truths [0.0]
We propose a deep learning (DL)-based method to first segment PCO images then classify the images into textittreatment required and textitnot yet required cases.
To train the model, we prepare a training image set with ground truths (GT) obtained from two strategies: (i) manual and (ii) automated.
arXiv Detail & Related papers (2022-11-11T10:36:42Z) - Towards Fully Automated Segmentation of Rat Cardiac MRI by Leveraging
Deep Learning Frameworks [1.6020567943077142]
We develop segmentation models that expand on the standard U-Net architecture and evaluate separate models for systole and diastole phases.
Applying Gaussian Processes to 1MSA allows to automate the selection of systole and diastole phases.
arXiv Detail & Related papers (2021-09-09T11:48:50Z) - Device-Robust Acoustic Scene Classification Based on Two-Stage
Categorization and Data Augmentation [63.98724740606457]
We present a joint effort of four groups, namely GT, USTC, Tencent, and UKE, to tackle Task 1 - Acoustic Scene Classification (ASC) in the DCASE 2020 Challenge.
Task 1a focuses on ASC of audio signals recorded with multiple (real and simulated) devices into ten different fine-grained classes.
Task 1b concerns with classification of data into three higher-level classes using low-complexity solutions.
arXiv Detail & Related papers (2020-07-16T15:07:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.