Related papers: SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation

SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation

URL: http://arxiv.org/abs/2312.14481v2
Date: Sat, 23 Mar 2024 03:13:35 GMT
Title: SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation
Authors: Wenxi Yue, Jing Zhang, Kun Hu, Qiuxia Wu, Zongyuan Ge, Yong Xia, Jiebo Luo, Zhiyong Wang,
Abstract summary: The Segment Anything Model (SAM) exhibits promise in generic object segmentation and offers potential for various applications. Existing methods have applied SAM to surgical instrument segmentation (SIS) by tuning SAM-based frameworks with surgical data. We propose SurgicalPart-SAM (SP-SAM), a novel SAM efficient-tuning approach that explicitly integrates instrument structure knowledge with SAM's generic knowledge.
Score: 66.21356751558011
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Segment Anything Model (SAM) exhibits promise in generic object segmentation and offers potential for various applications. Existing methods have applied SAM to surgical instrument segmentation (SIS) by tuning SAM-based frameworks with surgical data. However, they fall short in two crucial aspects: (1) Straightforward model tuning with instrument masks treats each instrument as a single entity, neglecting their complex structures and fine-grained details; and (2) Instrument category-based prompts are not flexible and informative enough to describe instrument structures. To address these problems, in this paper, we investigate text promptable SIS and propose SurgicalPart-SAM (SP-SAM), a novel SAM efficient-tuning approach that explicitly integrates instrument structure knowledge with SAM's generic knowledge, guided by expert knowledge on instrument part compositions. Specifically, we achieve this by proposing (1) Collaborative Prompts that describe instrument structures via collaborating category-level and part-level texts; (2) Cross-Modal Prompt Encoder that encodes text prompts jointly with visual embeddings into discriminative part-level representations; and (3) Part-to-Whole Adaptive Fusion and Hierarchical Decoding that adaptively fuse the part-level representations into a whole for accurate instrument segmentation in surgical scenarios. Built upon them, SP-SAM acquires a better capability to comprehend surgical instruments in terms of both overall structure and part-level details. Extensive experiments on both the EndoVis2018 and EndoVis2017 datasets demonstrate SP-SAM's state-of-the-art performance with minimal tunable parameters. The code will be available at https://github.com/wenxi-yue/SurgicalPart-SAM.

Related papers

Organ-aware Multi-scale Medical Image Segmentation Using Text Prompt Engineering [17.273290949721975]
Existing medical image segmentation methods rely on uni-modal visual inputs, such as images or videos, requiring labor-intensive manual annotations. Medical imaging techniques capture multiple intertwined organs within a single scan, further complicating segmentation accuracy. To address these challenges, MedSAM was developed to enhance segmentation accuracy by integrating image features with user-provided prompts.
arXiv Detail & Related papers (2025-03-18T01:35:34Z)
Learnable Prompting SAM-induced Knowledge Distillation for Semi-supervised Medical Image Segmentation [47.789013598970925]
We propose a learnable prompting SAM-induced Knowledge distillation framework (KnowSAM) for semi-supervised medical image segmentation. Our model outperforms the state-of-the-art semi-supervised segmentation approaches.
arXiv Detail & Related papers (2024-12-18T11:19:23Z)
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining [55.15365161143354]
OphCLIP is a hierarchical retrieval-augmented vision-language pretraining framework for ophthalmic surgical workflow understanding. OphCLIP learns both fine-grained and long-term visual representations by aligning short video clips with detailed narrative descriptions and full videos with structured titles. Our OphCLIP also designs a retrieval-augmented pretraining framework to leverage the underexplored large-scale silent surgical procedure videos.
arXiv Detail & Related papers (2024-11-23T02:53:08Z)
Amodal Segmentation for Laparoscopic Surgery Video Instruments [30.39518393494816]
We introduce AmodalVis to the realm of surgical instruments in the medical field. This technique identifies both the visible and occluded parts of an object. To achieve this, we introduce a new Amoal Instruments dataset.
arXiv Detail & Related papers (2024-08-02T07:40:34Z)
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation [88.80792308991867]
Segment Anything model (SAM) has shown ability to group image pixels into patches, but applying it to semantic-aware segmentation still faces major challenges. This paper presents SAM-CP, a simple approach that establishes two types of composable prompts beyond SAM and composes them for versatile segmentation. Experiments show that SAM-CP achieves semantic, instance, and panoptic segmentation in both open and closed domains.
arXiv Detail & Related papers (2024-07-23T17:47:25Z)
Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery [9.466779367920049]
In safety-critical surgical tasks, prompting is not possible due to lack of per-frame prompts for supervised learning. It is unrealistic to prompt frame-by-frame in a real-time tracking application, and it is expensive to annotate prompts for offline applications. We develop Surgical-DeSAM to generate automatic bounding box prompts for decoupling SAM to obtain instrument segmentation in real-time robotic surgery.
arXiv Detail & Related papers (2024-04-22T09:53:55Z)
Learning to Prompt Segment Anything Models [55.805816693815835]
Segment Anything Models (SAMs) have demonstrated great potential in learning to segment anything. SAMs work with two types of prompts including spatial prompts (e.g., points) and semantic prompts (e.g., texts) We propose spatial-semantic prompt learning (SSPrompt) that learns effective semantic and spatial prompts for better SAMs.
arXiv Detail & Related papers (2024-01-09T16:24:25Z)
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively [69.97238935096094]
The Open-Vocabulary SAM is a SAM-inspired model designed for simultaneous interactive segmentation and recognition. Our method can segment and recognize approximately 22,000 classes.
arXiv Detail & Related papers (2024-01-05T18:59:22Z)
SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation. Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes. In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z)
SAM Meets Robotic Surgery: An Empirical Study on Generalization, Robustness and Adaptation [15.995869434429274]
The Segment Anything Model (SAM) serves as a fundamental model for semantic segmentation. We examine SAM's robustness and zero-shot generalizability in the field of robotic surgery.
arXiv Detail & Related papers (2023-08-14T14:09:41Z)
SAM Meets Robotic Surgery: An Empirical Study in Robustness Perspective [21.2080716792596]
Segment Anything Model (SAM) is a foundation model for semantic segmentation. We investigate the robustness and zero-shot generalizability of the SAM in the domain of robotic surgery.
arXiv Detail & Related papers (2023-04-28T08:06:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.