Deep learning-based computer vision to recognize and classify suturing
gestures in robot-assisted surgery
- URL: http://arxiv.org/abs/2008.11833v1
- Date: Wed, 26 Aug 2020 21:45:04 GMT
- Title: Deep learning-based computer vision to recognize and classify suturing
gestures in robot-assisted surgery
- Authors: Francisco Luongo (1), Ryan Hakim (2), Jessica H. Nguyen (2),
Animashree Anandkumar (3), Andrew J Hung (2) ((1) Department of Biology and
Biological Engineering, Caltech (2) Center for Robotic Simulation &
Education, Catherine & Joseph Aresty Department of Urology, USC Institute of
Urology, University of Southern California (3) Department of Computing &
Mathematical Sciences, Caltech)
- Abstract summary: We train deep-learning based computer vision (CV) to automate the identification and classification of suturing gestures for needle driving attempts.
Our results demonstrate CV's ability to recognize features that not only can identify the action of suturing but also distinguish between different classifications of suturing gestures.
- Score: 9.248851083946048
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Our previous work classified a taxonomy of suturing gestures during a
vesicourethral anastomosis of robotic radical prostatectomy in association with
tissue tears and patient outcomes. Herein, we train deep-learning based
computer vision (CV) to automate the identification and classification of
suturing gestures for needle driving attempts. Using two independent raters, we
manually annotated live suturing video clips to label timepoints and gestures.
Identification (2395 videos) and classification (511 videos) datasets were
compiled to train CV models to produce two- and five-class label predictions,
respectively. Networks were trained on inputs of raw RGB pixels as well as
optical flow for each frame. Each model was trained on 80/20 train/test splits.
In this study, all models were able to reliably predict either the presence of
a gesture (identification, AUC: 0.88) as well as the type of gesture
(classification, AUC: 0.87) at significantly above chance levels. For both
gesture identification and classification datasets, we observed no effect of
recurrent classification model choice (LSTM vs. convLSTM) on performance. Our
results demonstrate CV's ability to recognize features that not only can
identify the action of suturing but also distinguish between different
classifications of suturing gestures. This demonstrates the potential to
utilize deep learning CV towards future automation of surgical skill
assessment.
Related papers
- Hierarchical Semi-Supervised Learning Framework for Surgical Gesture
Segmentation and Recognition Based on Multi-Modality Data [2.8770761243361593]
We develop a hierarchical semi-supervised learning framework for surgical gesture segmentation using multi-modality data.
A Transformer-based network with a pre-trained ResNet-18' backbone is used to extract visual features from the surgical operation videos.
The proposed approach has been evaluated using data from the publicly available JIGS database, including Suturing, Needle Passing, and Knot Tying tasks.
arXiv Detail & Related papers (2023-07-31T21:17:59Z) - Pseudo-label Guided Cross-video Pixel Contrast for Robotic Surgical
Scene Segmentation with Limited Annotations [72.15956198507281]
We propose PGV-CL, a novel pseudo-label guided cross-video contrast learning method to boost scene segmentation.
We extensively evaluate our method on a public robotic surgery dataset EndoVis18 and a public cataract dataset CaDIS.
arXiv Detail & Related papers (2022-07-20T05:42:19Z) - Revisiting Classifier: Transferring Vision-Language Models for Video
Recognition [102.93524173258487]
Transferring knowledge from task-agnostic pre-trained deep models for downstream tasks is an important topic in computer vision research.
In this study, we focus on transferring knowledge for video classification tasks.
We utilize the well-pretrained language model to generate good semantic target for efficient transferring learning.
arXiv Detail & Related papers (2022-07-04T10:00:47Z) - SCARF: Self-Supervised Contrastive Learning using Random Feature
Corruption [72.35532598131176]
We propose SCARF, a technique for contrastive learning, where views are formed by corrupting a random subset of features.
We show that SCARF complements existing strategies and outperforms alternatives like autoencoders.
arXiv Detail & Related papers (2021-06-29T08:08:33Z) - GAN for Vision, KG for Relation: a Two-stage Deep Network for Zero-shot
Action Recognition [33.23662792742078]
We propose a two-stage deep neural network for zero-shot action recognition.
In the sampling stage, we utilize a generative adversarial networks (GAN) trained by action features and word vectors of seen classes.
In the classification stage, we construct a knowledge graph based on the relationship between word vectors of action classes and related objects.
arXiv Detail & Related papers (2021-05-25T09:34:42Z) - Anomaly Detection in Cybersecurity: Unsupervised, Graph-Based and
Supervised Learning Methods in Adversarial Environments [63.942632088208505]
Inherent to today's operating environment is the practice of adversarial machine learning.
In this work, we examine the feasibility of unsupervised learning and graph-based methods for anomaly detection.
We incorporate a realistic adversarial training mechanism when training our supervised models to enable strong classification performance in adversarial environments.
arXiv Detail & Related papers (2021-05-14T10:05:10Z) - Domain Adaptive Robotic Gesture Recognition with Unsupervised
Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot.
It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture.
Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z) - Self-Supervised Learning via multi-Transformation Classification for
Action Recognition [10.676377556393527]
We introduce a self-supervised video representation learning method based on the multi-transformation classification to efficiently classify human actions.
The representation of the video is learned in a self-supervised manner by classifying seven different transformations.
We have conducted the experiments on UCF101 and HMDB51 datasets together with C3D and 3D Resnet-18 as backbone networks.
arXiv Detail & Related papers (2021-02-20T16:11:26Z) - Machine Learning-based Classification of Active Walking Tasks in Older
Adults using fNIRS [2.0953361712358025]
Cortical control of gait, specifically in the pre-frontal cortex as measured by functional near infrared spectroscopy (fNIRS), has shown to be moderated by age, gender, cognitive status, and various age-related disease conditions.
We develop classification models using machine learning methods to classify active walking tasks in older adults based on fNIRS signals.
arXiv Detail & Related papers (2021-02-08T03:44:24Z) - Self supervised contrastive learning for digital histopathology [0.0]
We use a contrastive self-supervised learning method called SimCLR that achieved state-of-the-art results on natural-scene images.
We find that combining multiple multi-organ datasets with different types of staining and resolution properties improves the quality of the learned features.
Linear classifiers trained on top of the learned features show that networks pretrained on digital histopathology datasets perform better than ImageNet pretrained networks.
arXiv Detail & Related papers (2020-11-27T19:18:45Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.