Text Promptable Surgical Instrument Segmentation with Vision-Language
Models
- URL: http://arxiv.org/abs/2306.09244v3
- Date: Wed, 8 Nov 2023 15:36:17 GMT
- Title: Text Promptable Surgical Instrument Segmentation with Vision-Language
Models
- Authors: Zijian Zhou, Oluwatosin Alabi, Meng Wei, Tom Vercauteren, Miaojing Shi
- Abstract summary: We propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments.
We leverage pretrained image and text encoders as our model backbone and design a text promptable mask decoder.
Experiments on several surgical instrument segmentation datasets demonstrate our model's superior performance and promising generalization capability.
- Score: 16.203166812021045
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a novel text promptable surgical instrument
segmentation approach to overcome challenges associated with diversity and
differentiation of surgical instruments in minimally invasive surgeries. We
redefine the task as text promptable, thereby enabling a more nuanced
comprehension of surgical instruments and adaptability to new instrument types.
Inspired by recent advancements in vision-language models, we leverage
pretrained image and text encoders as our model backbone and design a text
promptable mask decoder consisting of attention- and convolution-based
prompting schemes for surgical instrument segmentation prediction. Our model
leverages multiple text prompts for each surgical instrument through a new
mixture of prompts mechanism, resulting in enhanced segmentation performance.
Additionally, we introduce a hard instrument area reinforcement module to
improve image feature comprehension and segmentation precision. Extensive
experiments on several surgical instrument segmentation datasets demonstrate
our model's superior performance and promising generalization capability. To
our knowledge, this is the first implementation of a promptable approach to
surgical instrument segmentation, offering significant potential for practical
application in the field of robotic-assisted surgery. Code is available at
https://github.com/franciszzj/TP-SIS.
Related papers
- Surgical Scene Segmentation by Transformer With Asymmetric Feature Enhancement [7.150163844454341]
Vision-specific transformer method is a promising way for surgical scene understanding.
We propose a novel Transformer-based framework with an Asymmetric Feature Enhancement module (TAFE)
The proposed method outperforms the SOTA methods in several different surgical segmentation tasks and additionally proves its ability of fine-grained structure recognition.
arXiv Detail & Related papers (2024-10-23T07:58:47Z) - HecVL: Hierarchical Video-Language Pretraining for Zero-shot Surgical Phase Recognition [51.222684687924215]
HecVL is a novel hierarchical video-language pretraining approach for building a generalist surgical model.
We propose a novel fine-to-coarse contrastive learning framework that learns separate embedding spaces for the three video-text hierarchies.
By disentangling embedding spaces of different hierarchical levels, the learned multi-modal representations encode short-term and long-term surgical concepts in the same model.
arXiv Detail & Related papers (2024-05-16T13:14:43Z) - Pixel-Wise Recognition for Holistic Surgical Scene Understanding [31.338288460529046]
This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset.
GraSP is a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity.
We introduce the Transformers for Actions, Phases, Steps, and Instrument (TAPIS) model, a general architecture that combines a global video feature extractor with localized region proposals.
arXiv Detail & Related papers (2024-01-20T09:09:52Z) - SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation [66.21356751558011]
The Segment Anything Model (SAM) exhibits promise in generic object segmentation and offers potential for various applications.
Existing methods have applied SAM to surgical instrument segmentation (SIS) by tuning SAM-based frameworks with surgical data.
We propose SurgicalPart-SAM (SP-SAM), a novel SAM efficient-tuning approach that explicitly integrates instrument structure knowledge with SAM's generic knowledge.
arXiv Detail & Related papers (2023-12-22T07:17:51Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation.
Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes.
In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z) - Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures [51.78027546947034]
Recent advancements in surgical computer vision have been driven by vision-only models, which lack language semantics.
We propose leveraging surgical video lectures from e-learning platforms to provide effective vision and language supervisory signals.
We address surgery-specific linguistic challenges using multiple automatic speech recognition systems for text transcriptions.
arXiv Detail & Related papers (2023-07-27T22:38:12Z) - Surgical tool classification and localization: results and methods from
the MICCAI 2022 SurgToolLoc challenge [69.91670788430162]
We present the results of the SurgLoc 2022 challenge.
The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools.
We conclude by discussing these results in the broader context of machine learning and surgical data science.
arXiv Detail & Related papers (2023-05-11T21:44:39Z) - FUN-SIS: a Fully UNsupervised approach for Surgical Instrument
Segmentation [16.881624842773604]
We present FUN-SIS, a Fully-supervised approach for binary Surgical Instrument.
We train a per-frame segmentation model on completely unlabelled endoscopic videos, by relying on implicit motion information and instrument shape-priors.
The obtained fully-unsupervised results for surgical instrument segmentation are almost on par with the ones of fully-supervised state-of-the-art approaches.
arXiv Detail & Related papers (2022-02-16T15:32:02Z) - Surgical Instruction Generation with Transformers [6.97857490403095]
We introduce a transformer-backboned encoder-decoder network with self-critical reinforcement learning to generate instructions from surgical images.
We evaluate the effectiveness of our method on DAISI dataset, which includes 290 procedures from various medical disciplines.
arXiv Detail & Related papers (2021-07-14T19:54:50Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.