Related papers: Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search

Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search

URL: http://arxiv.org/abs/2002.08718v1
Date: Thu, 20 Feb 2020 13:12:38 GMT
Title: Automatic Gesture Recognition in Robot-assisted Surgery with Reinforcement Learning and Tree Search
Authors: Xiaojie Gao, Yueming Jin, Qi Dou, and Pheng-Ann Heng
Abstract summary: We propose a framework based on reinforcement learning and tree search for joint surgical gesture segmentation and classification. Our framework consistently outperforms the existing methods on the suturing task of JIGSAWS dataset in terms of accuracy, edit score and F1 score.
Score: 63.07088785532908
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Automatic surgical gesture recognition is fundamental for improving intelligence in robot-assisted surgery, such as conducting complicated tasks of surgery surveillance and skill evaluation. However, current methods treat each frame individually and produce the outcomes without effective consideration on future information. In this paper, we propose a framework based on reinforcement learning and tree search for joint surgical gesture segmentation and classification. An agent is trained to segment and classify the surgical video in a human-like manner whose direct decisions are re-considered by tree search appropriately. Our proposed tree search algorithm unites the outputs from two designed neural networks, i.e., policy and value network. With the integration of complementary information from distinct models, our framework is able to achieve the better performance than baseline methods using either of the neural networks. For an overall evaluation, our developed approach consistently outperforms the existing methods on the suturing task of JIGSAWS dataset in terms of accuracy, edit score and F1 score. Our study highlights the utilization of tree search to refine actions in reinforcement learning framework for surgical robotic applications.

Related papers

Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities [65.66373425605278]
Automated Surgical Phase Recognition (SPR) uses Artificial Intelligence (AI) to segment the surgical workflow into its key events. Previous research has focused on short and linear surgical procedures and has not explored if temporal context influences experts' ability to better classify surgical phases. This research addresses these gaps, focusing on Robot-Assisted Partial Nephrectomy (RAPN) as a highly non-linear procedure.
arXiv Detail & Related papers (2025-04-26T15:37:22Z)
SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP) The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z)
Hierarchical Semi-Supervised Learning Framework for Surgical Gesture Segmentation and Recognition Based on Multi-Modality Data [2.8770761243361593]
We develop a hierarchical semi-supervised learning framework for surgical gesture segmentation using multi-modality data. A Transformer-based network with a pre-trained ResNet-18' backbone is used to extract visual features from the surgical operation videos. The proposed approach has been evaluated using data from the publicly available JIGS database, including Suturing, Needle Passing, and Knot Tying tasks.
arXiv Detail & Related papers (2023-07-31T21:17:59Z)
Video-based Contrastive Learning on Decision Trees: from Action Recognition to Autism Diagnosis [17.866016075963437]
We present a new contrastive learning-based framework for decision tree-based classification of actions. The key idea is to translate the original multi-class action recognition into a series of binary classification tasks on a pre-constructed decision tree. We have demonstrated the promising performance of video-based autism spectrum disorder diagnosis on the CalTech interview video database.
arXiv Detail & Related papers (2023-04-20T04:02:04Z)
CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner. We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z)
ALBench: A Framework for Evaluating Active Learning in Object Detection [102.81795062493536]
This paper contributes an active learning benchmark framework named as ALBench for evaluating active learning in object detection. Developed on an automatic deep model training system, this ALBench framework is easy-to-use, compatible with different active learning algorithms, and ensures the same training and testing protocols.
arXiv Detail & Related papers (2022-07-27T07:46:23Z)
Video-based assessment of intraoperative surgical skill [7.79874072121082]
We present and validate two deep learning methods that directly assess skill using RGB videos. In the first method, we predict instrument tips as keypoints, and learn surgical skill using temporal convolutional neural networks. In the second method, we propose a novel architecture for surgical skill assessment that includes a frame-wise encoder (2D convolutional neural network) followed by a temporal model (recurrent neural network)
arXiv Detail & Related papers (2022-05-13T01:45:22Z)
Surgical Gesture Recognition Based on Bidirectional Multi-Layer Independently RNN with Explainable Spatial Feature Extraction [10.469989981471254]
We aim to develop an effective surgical gesture recognition approach with an explainable feature extraction process. A Bidirectional Multi-Layer independently RNN (BML-indRNN) model is proposed in this paper. To eliminate the black-box effects of DCNN, Gradient-weighted Class Activation Mapping (Grad-CAM) is employed. Results indicated that the testing accuracy for the suturing task based on our proposed method is 87.13%, which outperforms most of the state-of-the-art algorithms.
arXiv Detail & Related papers (2021-05-02T12:47:19Z)
Joint Learning of Neural Transfer and Architecture Adaptation for Image Recognition [77.95361323613147]
Current state-of-the-art visual recognition systems rely on pretraining a neural network on a large-scale dataset and finetuning the network weights on a smaller dataset. In this work, we prove that dynamically adapting network architectures tailored for each domain task along with weight finetuning benefits in both efficiency and effectiveness. Our method can be easily generalized to an unsupervised paradigm by replacing supernet training with self-supervised learning in the source domain tasks and performing linear evaluation in the downstream tasks.
arXiv Detail & Related papers (2021-03-31T08:15:17Z)
Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
Multi-Task Recurrent Neural Network for Surgical Gesture Recognition and Progress Prediction [17.63619129438996]
We propose a multi-task recurrent neural network for simultaneous recognition of surgical gestures and estimation of a novel formulation of surgical task progress. We demonstrate that recognition performance improves in multi-task frameworks with progress estimation without any additional manual labelling and training.
arXiv Detail & Related papers (2020-03-10T14:28:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.