Video-Instrument Synergistic Network for Referring Video Instrument
Segmentation in Robotic Surgery
- URL: http://arxiv.org/abs/2308.09475v1
- Date: Fri, 18 Aug 2023 11:24:06 GMT
- Title: Video-Instrument Synergistic Network for Referring Video Instrument
Segmentation in Robotic Surgery
- Authors: Hongqiu Wang, Lei Zhu, Guang Yang, Yike Guo, Shichen Zhang, Bo Xu,
Yueming Jin
- Abstract summary: This work explores a new task of Referring Surgical Video Instrument (RSVIS)
It aims to automatically identify and segment the corresponding surgical instruments based on the given language expression.
We devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance.
- Score: 29.72271827272853
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robot-assisted surgery has made significant progress, with instrument
segmentation being a critical factor in surgical intervention quality. It
serves as the building block to facilitate surgical robot navigation and
surgical education for the next generation of operating intelligence. Although
existing methods have achieved accurate instrument segmentation results, they
simultaneously generate segmentation masks for all instruments, without the
capability to specify a target object and allow an interactive experience. This
work explores a new task of Referring Surgical Video Instrument Segmentation
(RSVIS), which aims to automatically identify and segment the corresponding
surgical instruments based on the given language expression. To achieve this,
we devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both
video-level and instrument-level knowledge to boost performance, while previous
work only used video-level information. Meanwhile, we design a Graph-based
Relation-aware Module (GRM) to model the correlation between multi-modal
information (i.e., textual description and video frame) to facilitate the
extraction of instrument-level information. We are also the first to produce
two RSVIS datasets to promote related research. Our method is verified on these
datasets, and experimental results exhibit that the VIS-Net can significantly
outperform existing state-of-the-art referring segmentation methods. Our code
and our datasets will be released upon the publication of this work.
Related papers
- Amodal Segmentation for Laparoscopic Surgery Video Instruments [30.39518393494816]
We introduce AmodalVis to the realm of surgical instruments in the medical field.
This technique identifies both the visible and occluded parts of an object.
To achieve this, we introduce a new Amoal Instruments dataset.
arXiv Detail & Related papers (2024-08-02T07:40:34Z) - Instrument-tissue Interaction Detection Framework for Surgical Video Understanding [31.822025965225016]
We present an Instrument-Tissue Interaction Detection Network (ITIDNet) to detect the quintuple for surgery videos understanding.
Specifically, we propose a Snippet Consecutive Feature (SCF) Layer to enhance features by modeling relationships of proposals in the current frame using global context information in the video snippet.
To reason relationships between instruments and tissues, a Temporal Graph (TG) Layer is proposed with intra-frame connections to exploit relationships between instruments and tissues in the same frame and inter-frame connections to model the temporal information for the same instance.
arXiv Detail & Related papers (2024-03-30T11:21:11Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - Hierarchical Semi-Supervised Learning Framework for Surgical Gesture
Segmentation and Recognition Based on Multi-Modality Data [2.8770761243361593]
We develop a hierarchical semi-supervised learning framework for surgical gesture segmentation using multi-modality data.
A Transformer-based network with a pre-trained ResNet-18' backbone is used to extract visual features from the surgical operation videos.
The proposed approach has been evaluated using data from the publicly available JIGS database, including Suturing, Needle Passing, and Knot Tying tasks.
arXiv Detail & Related papers (2023-07-31T21:17:59Z) - Surgical tool classification and localization: results and methods from
the MICCAI 2022 SurgToolLoc challenge [69.91670788430162]
We present the results of the SurgLoc 2022 challenge.
The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools.
We conclude by discussing these results in the broader context of machine learning and surgical data science.
arXiv Detail & Related papers (2023-05-11T21:44:39Z) - Exploring Intra- and Inter-Video Relation for Surgical Semantic Scene
Segmentation [58.74791043631219]
We propose a novel framework STswinCL that explores the complementary intra- and inter-video relations to boost segmentation performance.
We extensively validate our approach on two public surgical video benchmarks, including EndoVis18 Challenge and CaDIS dataset.
Experimental results demonstrate the promising performance of our method, which consistently exceeds previous state-of-the-art approaches.
arXiv Detail & Related papers (2022-03-29T05:52:23Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Real-Time Instrument Segmentation in Robotic Surgery using Auxiliary
Supervised Deep Adversarial Learning [15.490603884631764]
Real-time semantic segmentation of the robotic instruments and tissues is a crucial step in robot-assisted surgery.
We have developed a light-weight cascaded convolutional neural network (CNN) to segment the surgical instruments from high-resolution videos.
We show that our model surpasses existing algorithms for pixel-wise segmentation of surgical instruments in both prediction accuracy and segmentation time of high-resolution videos.
arXiv Detail & Related papers (2020-07-22T10:16:07Z) - Synthetic and Real Inputs for Tool Segmentation in Robotic Surgery [10.562627972607892]
We show that it may be possible to use robot kinematic data coupled with laparoscopic images to alleviate the labelling problem.
We propose a new deep learning based model for parallel processing of both laparoscopic and simulation images.
arXiv Detail & Related papers (2020-07-17T16:33:33Z) - Learning Motion Flows for Semi-supervised Instrument Segmentation from
Robotic Surgical Video [64.44583693846751]
We study the semi-supervised instrument segmentation from robotic surgical videos with sparse annotations.
By exploiting generated data pairs, our framework can recover and even enhance temporal consistency of training sequences.
Results show that our method outperforms the state-of-the-art semisupervised methods by a large margin.
arXiv Detail & Related papers (2020-07-06T02:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.