ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding
- URL: http://arxiv.org/abs/2407.19435v1
- Date: Sun, 28 Jul 2024 09:25:59 GMT
- Title: ASI-Seg: Audio-Driven Surgical Instrument Segmentation with Surgeon Intention Understanding
- Authors: Zhen Chen, Zongming Zhang, Wenwu Guo, Xingjian Luo, Long Bai, Jinlin Wu, Hongliang Ren, Hongbin Liu,
- Abstract summary: We propose an audio-driven surgical instrument segmentation framework, named ASI-Seg, to accurately segment the required surgical instruments.
We propose an intention-oriented multimodal fusion to interpret the segmentation intention from audio commands and retrieve relevant instrument details.
Our ASI-Seg framework reveals remarkable advantages over classical state-of-the-art and medical SAMs in both semantic segmentation and intention-oriented segmentation.
- Score: 9.751642361457309
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical instrument segmentation is crucial in surgical scene understanding, thereby facilitating surgical safety. Existing algorithms directly detected all instruments of pre-defined categories in the input image, lacking the capability to segment specific instruments according to the surgeon's intention. During different stages of surgery, surgeons exhibit varying preferences and focus toward different surgical instruments. Therefore, an instrument segmentation algorithm that adheres to the surgeon's intention can minimize distractions from irrelevant instruments and assist surgeons to a great extent. The recent Segment Anything Model (SAM) reveals the capability to segment objects following prompts, but the manual annotations for prompts are impractical during the surgery. To address these limitations in operating rooms, we propose an audio-driven surgical instrument segmentation framework, named ASI-Seg, to accurately segment the required surgical instruments by parsing the audio commands of surgeons. Specifically, we propose an intention-oriented multimodal fusion to interpret the segmentation intention from audio commands and retrieve relevant instrument details to facilitate segmentation. Moreover, to guide our ASI-Seg segment of the required surgical instruments, we devise a contrastive learning prompt encoder to effectively distinguish the required instruments from the irrelevant ones. Therefore, our ASI-Seg promotes the workflow in the operating rooms, thereby providing targeted support and reducing the cognitive load on surgeons. Extensive experiments are performed to validate the ASI-Seg framework, which reveals remarkable advantages over classical state-of-the-art and medical SAMs in both semantic segmentation and intention-oriented segmentation. The source code is available at https://github.com/Zonmgin-Zhang/ASI-Seg.
Related papers
- SURGIVID: Annotation-Efficient Surgical Video Object Discovery [42.16556256395392]
We propose an annotation-efficient framework for the semantic segmentation of surgical scenes.
We employ image-based self-supervised object discovery to identify the most salient tools and anatomical structures in surgical videos.
Our unsupervised setup reinforced with only 36 annotation labels indicates comparable localization performance with fully-supervised segmentation models.
arXiv Detail & Related papers (2024-09-12T07:12:20Z) - Amodal Segmentation for Laparoscopic Surgery Video Instruments [30.39518393494816]
We introduce AmodalVis to the realm of surgical instruments in the medical field.
This technique identifies both the visible and occluded parts of an object.
To achieve this, we introduce a new Amoal Instruments dataset.
arXiv Detail & Related papers (2024-08-02T07:40:34Z) - Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery [9.466779367920049]
In safety-critical surgical tasks, prompting is not possible due to lack of per-frame prompts for supervised learning.
It is unrealistic to prompt frame-by-frame in a real-time tracking application, and it is expensive to annotate prompts for offline applications.
We develop Surgical-DeSAM to generate automatic bounding box prompts for decoupling SAM to obtain instrument segmentation in real-time robotic surgery.
arXiv Detail & Related papers (2024-04-22T09:53:55Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - SurgicalPart-SAM: Part-to-Whole Collaborative Prompting for Surgical Instrument Segmentation [66.21356751558011]
The Segment Anything Model (SAM) exhibits promise in generic object segmentation and offers potential for various applications.
Existing methods have applied SAM to surgical instrument segmentation (SIS) by tuning SAM-based frameworks with surgical data.
We propose SurgicalPart-SAM (SP-SAM), a novel SAM efficient-tuning approach that explicitly integrates instrument structure knowledge with SAM's generic knowledge.
arXiv Detail & Related papers (2023-12-22T07:17:51Z) - SurgicalSAM: Efficient Class Promptable Surgical Instrument Segmentation [65.52097667738884]
We introduce SurgicalSAM, a novel end-to-end efficient-tuning approach for SAM to integrate surgical-specific information with SAM's pre-trained knowledge for improved generalisation.
Specifically, we propose a lightweight prototype-based class prompt encoder for tuning, which directly generates prompt embeddings from class prototypes.
In addition, to address the low inter-class variance among surgical instrument categories, we propose contrastive prototype learning.
arXiv Detail & Related papers (2023-08-17T02:51:01Z) - Text Promptable Surgical Instrument Segmentation with Vision-Language
Models [16.203166812021045]
We propose a novel text promptable surgical instrument segmentation approach to overcome challenges associated with diversity and differentiation of surgical instruments.
We leverage pretrained image and text encoders as our model backbone and design a text promptable mask decoder.
Experiments on several surgical instrument segmentation datasets demonstrate our model's superior performance and promising generalization capability.
arXiv Detail & Related papers (2023-06-15T16:26:20Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - TraSeTR: Track-to-Segment Transformer with Contrastive Query for
Instance-level Instrument Segmentation in Robotic Surgery [60.439434751619736]
We propose TraSeTR, a Track-to-Segment Transformer that exploits tracking cues to assist surgical instrument segmentation.
TraSeTR jointly reasons about the instrument type, location, and identity with instance-level predictions.
The effectiveness of our method is demonstrated with state-of-the-art instrument type segmentation results on three public datasets.
arXiv Detail & Related papers (2022-02-17T05:52:18Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.