Related papers: Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery

Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery

URL: http://arxiv.org/abs/2008.11833v1
Date: Wed, 26 Aug 2020 21:45:04 GMT
Title: Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery
Authors: Francisco Luongo (1), Ryan Hakim (2), Jessica H. Nguyen (2), Animashree Anandkumar (3), Andrew J Hung (2) ((1) Department of Biology and Biological Engineering, Caltech (2) Center for Robotic Simulation & Education, Catherine & Joseph Aresty Department of Urology, USC Institute of Urology, University of Southern California (3) Department of Computing & Mathematical Sciences, Caltech)
Abstract summary: We train deep-learning based computer vision (CV) to automate the identification and classification of suturing gestures for needle driving attempts. Our results demonstrate CV's ability to recognize features that not only can identify the action of suturing but also distinguish between different classifications of suturing gestures.
Score: 9.248851083946048
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Our previous work classified a taxonomy of suturing gestures during a vesicourethral anastomosis of robotic radical prostatectomy in association with tissue tears and patient outcomes. Herein, we train deep-learning based computer vision (CV) to automate the identification and classification of suturing gestures for needle driving attempts. Using two independent raters, we manually annotated live suturing video clips to label timepoints and gestures. Identification (2395 videos) and classification (511 videos) datasets were compiled to train CV models to produce two- and five-class label predictions, respectively. Networks were trained on inputs of raw RGB pixels as well as optical flow for each frame. Each model was trained on 80/20 train/test splits. In this study, all models were able to reliably predict either the presence of a gesture (identification, AUC: 0.88) as well as the type of gesture (classification, AUC: 0.87) at significantly above chance levels. For both gesture identification and classification datasets, we observed no effect of recurrent classification model choice (LSTM vs. convLSTM) on performance. Our results demonstrate CV's ability to recognize features that not only can identify the action of suturing but also distinguish between different classifications of suturing gestures. This demonstrates the potential to utilize deep learning CV towards future automation of surgical skill assessment.

Related papers

Hierarchical Semi-Supervised Learning Framework for Surgical Gesture Segmentation and Recognition Based on Multi-Modality Data [2.8770761243361593]
We develop a hierarchical semi-supervised learning framework for surgical gesture segmentation using multi-modality data. A Transformer-based network with a pre-trained ResNet-18' backbone is used to extract visual features from the surgical operation videos. The proposed approach has been evaluated using data from the publicly available JIGS database, including Suturing, Needle Passing, and Knot Tying tasks.
arXiv Detail & Related papers (2023-07-31T21:17:59Z)
Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation. We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z)
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research. In this study, we focus on transferring knowledge for video classification tasks. We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z)
SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z)
GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot Action Recognition [33.23662792742078]
We propose a two-stage deep neural network for zero-shot action recognition. In the sampling stage, we utilize a generative adversarial networks (GAN) trained by action features and word vectors of seen classes. In the classification stage, we construct a knowledge graph based on the relationship between word vectors of action classes and related objects.
arXiv Detail & Related papers (2021-05-25T09:34:42Z)
Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning. In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection. We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z)
Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot. It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture. Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z)
Self-Supervised Learning via multi-Transformation Classification for Action Recognition [10.676377556393527]
We introduce a self-supervised video representation learning method based on the multi-transformation classification to efficiently classify human actions. The representation of the video is learned in a self-supervised manner by classifying seven different transformations. We have conducted the experiments on UCF101 and HMDB51 datasets together with C3D and 3D Resnet-18 as backbone networks.
arXiv Detail & Related papers (2021-02-20T16:11:26Z)
Machine Learning-based Classification of Active Walking Tasks in Older Adults using fNIRS [2.0953361712358025]
Cortical control of gait, specifically in the pre-frontal cortex as measured by functional near infrared spectroscopy (fNIRS), has shown to be moderated by age, gender, cognitive status, and various age-related disease conditions. We develop classification models using machine learning methods to classify active walking tasks in older adults based on fNIRS signals.
arXiv Detail & Related papers (2021-02-08T03:44:24Z)
Self supervised contrastive learning for digital histopathology [0.0]
We use a contrastive self-supervised learning method called SimCLR that achieved state-of-the-art results on natural-scene images. We find that combining multiple multi-organ datasets with different types of staining and resolution properties improves the quality of the learned features. Linear classifiers trained on top of the learned features show that networks pretrained on digital histopathology datasets perform better than ImageNet pretrained networks.
arXiv Detail & Related papers (2020-11-27T19:18:45Z)
Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.