Related papers: Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2

Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2

URL: http://arxiv.org/abs/2408.01648v1
Date: Sat, 3 Aug 2024 03:19:56 GMT
Title: Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2
Authors: Ange Lou, Yamin Li, Yike Zhang, Robert F. Labadie, Jack Noble,
Abstract summary: The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. We evaluate the zero-shot video segmentation performance of the SAM 2 model across different types of surgeries, including endoscopy and microscopy. We found that: 1) SAM 2 demonstrates a strong capability for segmenting various surgical videos; 2) When new tools enter the scene, additional prompts are necessary to maintain segmentation accuracy; and 3) Specific challenges inherent to surgical videos can impact the robustness of SAM 2.
Score: 4.418542191434178
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Segment Anything Model 2 (SAM 2) is the latest generation foundation model for image and video segmentation. Trained on the expansive Segment Anything Video (SA-V) dataset, which comprises 35.5 million masks across 50.9K videos, SAM 2 advances its predecessor's capabilities by supporting zero-shot segmentation through various prompts (e.g., points, boxes, and masks). Its robust zero-shot performance and efficient memory usage make SAM 2 particularly appealing for surgical tool segmentation in videos, especially given the scarcity of labeled data and the diversity of surgical procedures. In this study, we evaluate the zero-shot video segmentation performance of the SAM 2 model across different types of surgeries, including endoscopy and microscopy. We also assess its performance on videos featuring single and multiple tools of varying lengths to demonstrate SAM 2's applicability and effectiveness in the surgical domain. We found that: 1) SAM 2 demonstrates a strong capability for segmenting various surgical videos; 2) When new tools enter the scene, additional prompts are necessary to maintain segmentation accuracy; and 3) Specific challenges inherent to surgical videos can impact the robustness of SAM 2.

Related papers

Memory-Augmented SAM2 for Training-Free Surgical Video Segmentation [18.71772979219666]
We introduce Memory Augmented (MA)-SAM2, a training-free video object segmentation strategy.<n>MA-SAM2 exhibits strong robustness against occlusions and interactions arising from complex instrument movements.<n>Without introducing any additional parameters or requiring further training, MA-SAM2 achieved performance improvements of 4.36% and 6.1% over SAM2 on the EndoVis 2017 and EndoVis 2018 datasets.
arXiv Detail & Related papers (2025-07-13T11:05:25Z)
SurgVidLM: Towards Multi-grained Surgical Video Understanding with Large Language Model [55.13206879750197]
SurgVidLM is the first video language model designed to address both full and fine-grained surgical video comprehension.<n>We introduce the StageFocus mechanism which is a two-stage framework performing the multi-grained, progressive understanding of surgical videos.<n> Experimental results demonstrate that SurgVidLM significantly outperforms state-of-the-art Vid-LLMs in both full and fine-grained video understanding tasks.
arXiv Detail & Related papers (2025-06-22T02:16:18Z)
DC-SAM: In-Context Segment Anything in Images and Videos via Dual Consistency [91.30252180093333]
We propose the Dual Consistency SAM (DCSAM) method based on prompttuning to adapt SAM and SAM2 for in-context segmentation. Our key insights are to enhance the features of the SAM's prompt encoder in segmentation by providing high-quality visual prompts. Although the proposed DC-SAM is primarily designed for images, it can be seamlessly extended to the video domain with the support SAM2.
arXiv Detail & Related papers (2025-04-16T13:41:59Z)
Is Segment Anything Model 2 All You Need for Surgery Video Segmentation? A Systematic Evaluation [25.459372606957736]
In this paper, we systematically evaluate the performance of SAM2 model in zero-shot surgery video segmentation task. We conducted experiments under different configurations, including different prompting strategies, robustness, etc.
arXiv Detail & Related papers (2024-12-31T16:20:05Z)
DB-SAM: Delving into High Quality Universal Medical Image Segmentation [100.63434169944853]
We propose a dual-branch adapted SAM framework, named DB-SAM, to bridge the gap between natural and 2D/3D medical data. Our proposed DB-SAM achieves an absolute gain of 8.8%, compared to a recent medical SAM adapter in the literature.
arXiv Detail & Related papers (2024-10-05T14:36:43Z)
SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [51.90445260276897]
We prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models. We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation.
arXiv Detail & Related papers (2024-08-16T17:55:38Z)
Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning [13.90996725220123]
We introduce Surgical SAM 2 (SurgSAM-2), an advanced model to utilize SAM2 with an Efficient Frame Pruning mechanism. SurgSAM-2 significantly improves both efficiency and segmentation accuracy compared to the vanilla SAM2. Remarkably, SurgSAM-2 achieves a 3$times$ FPS compared with SAM2, while also delivering state-of-the-art performance after fine-tuning with lower-resolution data.
arXiv Detail & Related papers (2024-08-15T04:59:12Z)
Novel adaptation of video segmentation to 3D MRI: efficient zero-shot knee segmentation with SAM2 [1.6237741047782823]
We introduce a method for zero-shot, single-prompt segmentation of 3D knee MRI by adapting Segment Anything Model 2. By treating slices from 3D medical volumes as individual video frames, we leverage SAM2's advanced capabilities to generate motion- and spatially-aware predictions. We demonstrate that SAM2 can efficiently perform segmentation tasks in a zero-shot manner with no additional training or fine-tuning.
arXiv Detail & Related papers (2024-08-08T21:39:15Z)
SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation [13.609341065893739]
This study explores the zero-shot segmentation performance of SAM 2 in robot-assisted surgery based on prompts. We employ two forms of prompts: 1-point and bounding box, while for video sequences, the 1-point prompt is applied to the initial frame. The results with point prompts also exhibit a substantial enhancement over SAM's capabilities, nearing or even surpassing existing unprompted SOTA methods.
arXiv Detail & Related papers (2024-08-08T17:08:57Z)
Segment Anything in Medical Images and Videos: Benchmark and Deployment [8.51742337818826]
We first present a comprehensive benchmarking of the Segment Anything Model 2 (SAM2) across 11 medical image modalities and videos. Then, we develop a transfer learning pipeline and demonstrate SAM2 can be quickly adapted to medical domain by fine-tuning. We implement SAM2 as a 3D slicer plugin and Gradio API for efficient 3D image and video segmentation.
arXiv Detail & Related papers (2024-08-06T17:58:18Z)
Segment anything model 2: an application to 2D and 3D medical images [16.253160684182895]
Segment Anything Model (SAM) has gained significant attention because of its ability to segment various objects in images given a prompt. Recently developed SAM 2 has extended this ability to video inputs. This opens an opportunity to apply SAM to 3D images, one of the fundamental tasks in the medical imaging field.
arXiv Detail & Related papers (2024-08-01T17:57:25Z)
SAM 2: Segment Anything in Images and Videos [63.44869623822368]
We present Segment Anything Model 2 (SAM 2), a foundation model towards solving promptable visual segmentation in images and videos. We build a data engine, which improves model and data via user interaction, to collect the largest video segmentation dataset to date. Our model is a simple transformer architecture with streaming memory for real-time video processing.
arXiv Detail & Related papers (2024-08-01T17:00:08Z)
MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation [58.53672866662472]
We introduce a modality-agnostic SAM adaptation framework, named as MA-SAM. Our method roots in the parameter-efficient fine-tuning strategy to update only a small portion of weight increments. By injecting a series of 3D adapters into the transformer blocks of the image encoder, our method enables the pre-trained 2D backbone to extract third-dimensional information from input data.
arXiv Detail & Related papers (2023-09-16T02:41:53Z)
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z)
Medical SAM Adapter: Adapting Segment Anything Model for Medical Image Segmentation [51.770805270588625]
The Segment Anything Model (SAM) has recently gained popularity in the field of image segmentation. Recent studies and individual experiments have shown that SAM underperforms in medical image segmentation. We propose the Medical SAM Adapter (Med-SA), which incorporates domain-specific medical knowledge into the segmentation model.
arXiv Detail & Related papers (2023-04-25T07:34:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.