From Generalization to Precision: Exploring SAM for Tool Segmentation in
Surgical Environments
- URL: http://arxiv.org/abs/2402.17972v1
- Date: Wed, 28 Feb 2024 01:33:49 GMT
- Title: From Generalization to Precision: Exploring SAM for Tool Segmentation in
Surgical Environments
- Authors: Kanyifeechukwu J. Oguine, Roger D. Soberanis-Mukul, Nathan Drenkow,
Mathias Unberath
- Abstract summary: We argue that Segment Anything Model drastically over-segment images with high corruption levels, resulting in degraded performance.
We employ the ground-truth tool mask to analyze the results of SAM when the best single mask is selected as prediction.
We analyze the Endovis18 and Endovis17 instrument segmentation datasets using synthetic corruptions of various strengths and an In-House dataset featuring counterfactually created real-world corruptions.
- Score: 7.01085327371458
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Purpose: Accurate tool segmentation is essential in computer-aided
procedures. However, this task conveys challenges due to artifacts' presence
and the limited training data in medical scenarios. Methods that generalize to
unseen data represent an interesting venue, where zero-shot segmentation
presents an option to account for data limitation. Initial exploratory works
with the Segment Anything Model (SAM) show that bounding-box-based prompting
presents notable zero-short generalization. However, point-based prompting
leads to a degraded performance that further deteriorates under image
corruption. We argue that SAM drastically over-segment images with high
corruption levels, resulting in degraded performance when only a single
segmentation mask is considered, while the combination of the masks overlapping
the object of interest generates an accurate prediction. Method: We use SAM to
generate the over-segmented prediction of endoscopic frames. Then, we employ
the ground-truth tool mask to analyze the results of SAM when the best single
mask is selected as prediction and when all the individual masks overlapping
the object of interest are combined to obtain the final predicted mask. We
analyze the Endovis18 and Endovis17 instrument segmentation datasets using
synthetic corruptions of various strengths and an In-House dataset featuring
counterfactually created real-world corruptions. Results: Combining the
over-segmented masks contributes to improvements in the IoU. Furthermore,
selecting the best single segmentation presents a competitive IoU score for
clean images. Conclusions: Combined SAM predictions present improved results
and robustness up to a certain corruption level. However, appropriate prompting
strategies are fundamental for implementing these models in the medical domain.
Related papers
- Bridge the Points: Graph-based Few-shot Segment Anything Semantically [79.1519244940518]
Recent advancements in pre-training techniques have enhanced the capabilities of vision foundation models.
Recent studies extend the SAM to Few-shot Semantic segmentation (FSS)
We propose a simple yet effective approach based on graph analysis.
arXiv Detail & Related papers (2024-10-09T15:02:28Z) - Adapting Segment Anything Model for Unseen Object Instance Segmentation [70.60171342436092]
Unseen Object Instance (UOIS) is crucial for autonomous robots operating in unstructured environments.
We propose UOIS-SAM, a data-efficient solution for the UOIS task.
UOIS-SAM integrates two key components: (i) a Heatmap-based Prompt Generator (HPG) to generate class-agnostic point prompts with precise foreground prediction, and (ii) a Hierarchical Discrimination Network (HDNet) that adapts SAM's mask decoder.
arXiv Detail & Related papers (2024-09-23T19:05:50Z) - MAS-SAM: Segment Any Marine Animal with Aggregated Features [55.91291540810978]
We propose a novel feature learning framework named MAS-SAM for marine animal segmentation.
Our method enables to extract richer marine information from global contextual cues to fine-grained local details.
arXiv Detail & Related papers (2024-04-24T07:38:14Z) - PosSAM: Panoptic Open-vocabulary Segment Anything [58.72494640363136]
PosSAM is an open-vocabulary panoptic segmentation model that unifies the strengths of the Segment Anything Model (SAM) with the vision-native CLIP model in an end-to-end framework.
We introduce a Mask-Aware Selective Ensembling (MASE) algorithm that adaptively enhances the quality of generated masks and boosts the performance of open-vocabulary classification during inference for each image.
arXiv Detail & Related papers (2024-03-14T17:55:03Z) - BLO-SAM: Bi-level Optimization Based Overfitting-Preventing Finetuning
of SAM [37.1263294647351]
We introduce BLO-SAM, which finetunes the Segment Anything Model (SAM) based on bi-level optimization (BLO)
BLO-SAM reduces the risk of overfitting by training the model's weight parameters and the prompt embedding on two separate subsets of the training dataset.
Results demonstrate BLO-SAM's superior performance over various state-of-the-art image semantic segmentation methods.
arXiv Detail & Related papers (2024-02-26T06:36:32Z) - Variance-insensitive and Target-preserving Mask Refinement for
Interactive Image Segmentation [68.16510297109872]
Point-based interactive image segmentation can ease the burden of mask annotation in applications such as semantic segmentation and image editing.
We introduce a novel method, Variance-Insensitive and Target-Preserving Mask Refinement to enhance segmentation quality with fewer user inputs.
Experiments on GrabCut, Berkeley, SBD, and DAVIS datasets demonstrate our method's state-of-the-art performance in interactive image segmentation.
arXiv Detail & Related papers (2023-12-22T02:31:31Z) - PWISeg: Point-based Weakly-supervised Instance Segmentation for Surgical
Instruments [27.89003436883652]
We propose a weakly-supervised surgical instrument segmentation approach, named Point-based Weakly-supervised Instance (PWISeg)
PWISeg adopts an FCN-based architecture with point-to-box and point-to-mask branches to model the relationships between feature points and bounding boxes.
Based on this, we propose a key pixel association loss and a key pixel distribution loss, driving the point-to-mask branch to generate more accurate segmentation predictions.
arXiv Detail & Related papers (2023-11-16T11:48:29Z) - DeSAM: Decoupled Segment Anything Model for Generalizable Medical Image Segmentation [22.974876391669685]
Segment Anything Model (SAM) shows potential for improving the cross-domain robustness of medical image segmentation.
SAM performs significantly worse in automatic segmentation scenarios than when manually prompted.
Decoupled SAM modifies SAM's mask decoder by introducing two new modules.
arXiv Detail & Related papers (2023-06-01T09:49:11Z) - Semantic Attention and Scale Complementary Network for Instance
Segmentation in Remote Sensing Images [54.08240004593062]
We propose an end-to-end multi-category instance segmentation model, which consists of a Semantic Attention (SEA) module and a Scale Complementary Mask Branch (SCMB)
SEA module contains a simple fully convolutional semantic segmentation branch with extra supervision to strengthen the activation of interest instances on the feature map.
SCMB extends the original single mask branch to trident mask branches and introduces complementary mask supervision at different scales.
arXiv Detail & Related papers (2021-07-25T08:53:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.