SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation
- URL: http://arxiv.org/abs/2408.04593v1
- Date: Thu, 8 Aug 2024 17:08:57 GMT
- Title: SAM 2 in Robotic Surgery: An Empirical Evaluation for Robustness and Generalization in Surgical Video Segmentation
- Authors: Jieming Yu, An Wang, Wenzhen Dong, Mengya Xu, Mobarakol Islam, Jie Wang, Long Bai, Hongliang Ren,
- Abstract summary: This study explores the zero-shot segmentation performance of SAM 2 in robot-assisted surgery based on prompts.
We employ two forms of prompts: 1-point and bounding box, while for video sequences, the 1-point prompt is applied to the initial frame.
The results with point prompts also exhibit a substantial enhancement over SAM's capabilities, nearing or even surpassing existing unprompted SOTA methods.
- Score: 13.609341065893739
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The recent Segment Anything Model (SAM) 2 has demonstrated remarkable foundational competence in semantic segmentation, with its memory mechanism and mask decoder further addressing challenges in video tracking and object occlusion, thereby achieving superior results in interactive segmentation for both images and videos. Building upon our previous empirical studies, we further explore the zero-shot segmentation performance of SAM 2 in robot-assisted surgery based on prompts, alongside its robustness against real-world corruption. For static images, we employ two forms of prompts: 1-point and bounding box, while for video sequences, the 1-point prompt is applied to the initial frame. Through extensive experimentation on the MICCAI EndoVis 2017 and EndoVis 2018 benchmarks, SAM 2, when utilizing bounding box prompts, outperforms state-of-the-art (SOTA) methods in comparative evaluations. The results with point prompts also exhibit a substantial enhancement over SAM's capabilities, nearing or even surpassing existing unprompted SOTA methodologies. Besides, SAM 2 demonstrates improved inference speed and less performance degradation against various image corruption. Although slightly unsatisfactory results remain in specific edges or regions, SAM 2's robust adaptability to 1-point prompts underscores its potential for downstream surgical tasks with limited prompt requirements.
Related papers
- SAM2-UNet: Segment Anything 2 Makes Strong Encoder for Natural and Medical Image Segmentation [51.90445260276897]
We prove that the Segment Anything Model 2 (SAM2) can be a strong encoder for U-shaped segmentation models.
We propose a simple but effective framework, termed SAM2-UNet, for versatile image segmentation.
arXiv Detail & Related papers (2024-08-16T17:55:38Z) - Surgical SAM 2: Real-time Segment Anything in Surgical Video by Efficient Frame Pruning [13.90996725220123]
We introduce Surgical SAM 2 (SurgSAM-2), an advanced model to utilize SAM2 with an Efficient Frame Pruning mechanism.
SurgSAM-2 significantly improves both efficiency and segmentation accuracy compared to the vanilla SAM2.
Remarkably, SurgSAM-2 achieves a 3$times$ FPS compared with SAM2, while also delivering state-of-the-art performance after fine-tuning with lower-resolution data.
arXiv Detail & Related papers (2024-08-15T04:59:12Z) - Is SAM 2 Better than SAM in Medical Image Segmentation? [0.6144680854063939]
The Segment Anything Model (SAM) has demonstrated impressive performance in zero-shot promptable segmentation on natural images.
The recently released Segment Anything Model 2 (SAM 2) claims to outperform SAM on images and extends the model's capabilities to video segmentation.
We conducted extensive studies using multiple datasets to compare the performance of SAM and SAM 2.
arXiv Detail & Related papers (2024-08-08T04:34:29Z) - Evaluating SAM2's Role in Camouflaged Object Detection: From SAM to SAM2 [10.751277821864916]
Report reveals a decline in SAM2's ability to perceive different objects in images without prompts in its auto mode.
Specifically, we employ the challenging task of camouflaged object detection to assess this performance decrease.
arXiv Detail & Related papers (2024-07-31T13:32:10Z) - Segment Anything for Videos: A Systematic Survey [52.28931543292431]
The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond.
The segment anything model (SAM) has sparked a passion for exploring task-agnostic visual foundation models.
This work conducts a systematic review on SAM for videos in the era of foundation models.
arXiv Detail & Related papers (2024-07-31T02:24:53Z) - FocSAM: Delving Deeply into Focused Objects in Segmenting Anything [58.042354516491024]
The Segment Anything Model (SAM) marks a notable milestone in segmentation models.
We propose FocSAM with a pipeline redesigned on two pivotal aspects.
First, we propose Dynamic Window Multi-head Self-Attention (Dwin-MSA) to dynamically refocus SAM's image embeddings on the target object.
Second, we propose Pixel-wise Dynamic ReLU (P-DyReLU) to enable sufficient integration of interactive information from a few initial clicks.
arXiv Detail & Related papers (2024-05-29T02:34:13Z) - Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively [69.97238935096094]
The Open-Vocabulary SAM is a SAM-inspired model designed for simultaneous interactive segmentation and recognition.
Our method can segment and recognize approximately 22,000 classes.
arXiv Detail & Related papers (2024-01-05T18:59:22Z) - Guided Prompting in SAM for Weakly Supervised Cell Segmentation in
Histopathological Images [27.14641973632063]
This paper focuses on using weak supervision -- annotation from related tasks -- to induce a segmenter.
Recent foundation models, such as Segment Anything (SAM), can use prompts to leverage additional supervision during inference.
All SAM-based solutions hugely outperform existing weakly supervised image segmentation models, obtaining 9-15 pt Dice gains.
arXiv Detail & Related papers (2023-11-29T11:18:48Z) - Stable Segment Anything Model [79.9005670886038]
The Segment Anything Model (SAM) achieves remarkable promptable segmentation given high-quality prompts.
This paper presents the first comprehensive analysis on SAM's segmentation stability across a diverse spectrum of prompt qualities.
Our solution, termed Stable-SAM, offers several advantages: 1) improved SAM's segmentation stability across a wide range of prompt qualities, while 2) retaining SAM's powerful promptable segmentation efficiency and generality.
arXiv Detail & Related papers (2023-11-27T12:51:42Z) - SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation.
Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes.
In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z) - SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective [21.2080716792596]
Segment Anything Model (SAM) is a foundation model for semantic segmentation.
We investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery.
arXiv Detail & Related papers (2023-04-28T08:06:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.