Hierarchical Semi-Supervised Learning Framework for Surgical Gesture
Segmentation and Recognition Based on Multi-Modality Data
- URL: http://arxiv.org/abs/2308.02529v1
- Date: Mon, 31 Jul 2023 21:17:59 GMT
- Title: Hierarchical Semi-Supervised Learning Framework for Surgical Gesture
Segmentation and Recognition Based on Multi-Modality Data
- Authors: Zhili Yuan, Jialin Lin, Dandan Zhang
- Abstract summary: We develop a hierarchical semi-supervised learning framework for surgical gesture segmentation using multi-modality data.
A Transformer-based network with a pre-trained ResNet-18' backbone is used to extract visual features from the surgical operation videos.
The proposed approach has been evaluated using data from the publicly available JIGS database, including Suturing, Needle Passing, and Knot Tying tasks.
- Score: 2.8770761243361593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Segmenting and recognizing surgical operation trajectories into distinct,
meaningful gestures is a critical preliminary step in surgical workflow
analysis for robot-assisted surgery. This step is necessary for facilitating
learning from demonstrations for autonomous robotic surgery, evaluating
surgical skills, and so on. In this work, we develop a hierarchical
semi-supervised learning framework for surgical gesture segmentation using
multi-modality data (i.e. kinematics and vision data). More specifically,
surgical tasks are initially segmented based on distance characteristics-based
profiles and variance characteristics-based profiles constructed using
kinematics data. Subsequently, a Transformer-based network with a pre-trained
`ResNet-18' backbone is used to extract visual features from the surgical
operation videos. By combining the potential segmentation points obtained from
both modalities, we can determine the final segmentation points. Furthermore,
gesture recognition can be implemented based on supervised learning. The
proposed approach has been evaluated using data from the publicly available
JIGSAWS database, including Suturing, Needle Passing, and Knot Tying tasks. The
results reveal an average F1 score of 0.623 for segmentation and an accuracy of
0.856 for recognition.
Related papers
- SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - Video-Instrument Synergistic Network for Referring Video Instrument
Segmentation in Robotic Surgery [29.72271827272853]
This work explores a new task of Referring Surgical Video Instrument (RSVIS)
It aims to automatically identify and segment the corresponding surgical instruments based on the given language expression.
We devise a novel Video-Instrument Synergistic Network (VIS-Net) to learn both video-level and instrument-level knowledge to boost performance.
arXiv Detail & Related papers (2023-08-18T11:24:06Z) - Surgical Phase Recognition in Laparoscopic Cholecystectomy [57.929132269036245]
We propose a Transformer-based method that utilizes calibrated confidence scores for a 2-stage inference pipeline.
Our method outperforms the baseline model on the Cholec80 dataset, and can be applied to a variety of action segmentation methods.
arXiv Detail & Related papers (2022-06-14T22:55:31Z) - Surgical Gesture Recognition Based on Bidirectional Multi-Layer
Independently RNN with Explainable Spatial Feature Extraction [10.469989981471254]
We aim to develop an effective surgical gesture recognition approach with an explainable feature extraction process.
A Bidirectional Multi-Layer independently RNN (BML-indRNN) model is proposed in this paper.
To eliminate the black-box effects of DCNN, Gradient-weighted Class Activation Mapping (Grad-CAM) is employed.
Results indicated that the testing accuracy for the suturing task based on our proposed method is 87.13%, which outperforms most of the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-02T12:47:19Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z) - Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions.
Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures.
The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z) - Multi-Task Recurrent Neural Network for Surgical Gesture Recognition and
Progress Prediction [17.63619129438996]
We propose a multi-task recurrent neural network for simultaneous recognition of surgical gestures and estimation of a novel formulation of surgical task progress.
We demonstrate that recognition performance improves in multi-task frameworks with progress estimation without any additional manual labelling and training.
arXiv Detail & Related papers (2020-03-10T14:28:02Z) - Automatic Gesture Recognition in Robot-assisted Surgery with
Reinforcement Learning and Tree Search [63.07088785532908]
We propose a framework based on reinforcement learning and tree search for joint surgical gesture segmentation and classification.
Our framework consistently outperforms the existing methods on the suturing task of JIGSAWS dataset in terms of accuracy, edit score and F1 score.
arXiv Detail & Related papers (2020-02-20T13:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.