Detection and Localization of Robotic Tools in Robot-Assisted Surgery
Videos Using Deep Neural Networks for Region Proposal and Detection
- URL: http://arxiv.org/abs/2008.00936v1
- Date: Wed, 29 Jul 2020 10:59:15 GMT
- Title: Detection and Localization of Robotic Tools in Robot-Assisted Surgery
Videos Using Deep Neural Networks for Region Proposal and Detection
- Authors: Duygu Sarikaya, Jason J. Corso and Khurshid A. Guru
- Abstract summary: We propose a solution to the tool detection and localization open problem in RAS video understanding.
We propose an architecture using multimodal convolutional neural networks for fast detection and localization of tools in RAS videos.
Our results with an Average Precision (AP) of 91% and a mean time of 0.1 seconds per test frame detection indicate that our study is superior to conventionally used methods for medical imaging.
- Score: 30.042965489804356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Video understanding of robot-assisted surgery (RAS) videos is an active
research area. Modeling the gestures and skill level of surgeons presents an
interesting problem. The insights drawn may be applied in effective skill
acquisition, objective skill assessment, real-time feedback, and human-robot
collaborative surgeries. We propose a solution to the tool detection and
localization open problem in RAS video understanding, using a strictly computer
vision approach and the recent advances of deep learning. We propose an
architecture using multimodal convolutional neural networks for fast detection
and localization of tools in RAS videos. To our knowledge, this approach will
be the first to incorporate deep neural networks for tool detection and
localization in RAS videos. Our architecture applies a Region Proposal Network
(RPN), and a multi-modal two stream convolutional network for object detection,
to jointly predict objectness and localization on a fusion of image and
temporal motion cues. Our results with an Average Precision (AP) of 91% and a
mean computation time of 0.1 seconds per test frame detection indicate that our
study is superior to conventionally used methods for medical imaging while also
emphasizing the benefits of using RPN for precision and efficiency. We also
introduce a new dataset, ATLAS Dione, for RAS video understanding. Our dataset
provides video data of ten surgeons from Roswell Park Cancer Institute (RPCI)
(Buffalo, NY) performing six different surgical tasks on the daVinci Surgical
System (dVSS R ) with annotations of robotic tools per frame.
Related papers
- Robotic Navigation Autonomy for Subretinal Injection via Intelligent
Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection.
Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target.
Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z) - Deep Learning Computer Vision Algorithms for Real-time UAVs On-board
Camera Image Processing [77.34726150561087]
This paper describes how advanced deep learning based computer vision algorithms are applied to enable real-time on-board sensor processing for small UAVs.
All algorithms have been developed using state-of-the-art image processing methods based on deep neural networks.
arXiv Detail & Related papers (2022-11-02T11:10:42Z) - Video-based assessment of intraoperative surgical skill [7.79874072121082]
We present and validate two deep learning methods that directly assess skill using RGB videos.
In the first method, we predict instrument tips as keypoints, and learn surgical skill using temporal convolutional neural networks.
In the second method, we propose a novel architecture for surgical skill assessment that includes a frame-wise encoder (2D convolutional neural network) followed by a temporal model (recurrent neural network)
arXiv Detail & Related papers (2022-05-13T01:45:22Z) - Dynamic Gesture Recognition [0.0]
It is possible to use machine learning to classify images and/or videos instead of the traditional computer vision algorithms.
The aim of this project is to builda symbiosis between a convolutional neural network (CNN) and a recurrent neural network (RNN)
arXiv Detail & Related papers (2021-09-20T09:45:29Z) - A Novel Deep ML Architecture by Integrating Visual Simultaneous
Localization and Mapping (vSLAM) into Mask R-CNN for Real-time Surgical Video
Analysis [0.0]
In this research, a novel machine learning architecture, RPM-CNN, is created to perform real-time surgical analysis.
RPM-CNN integrates visual simultaneous localization and mapping (vSLAM) into Mask R-CNN.
To apply RPM-CNN's real-time top performance to the real world, a Microsoft HoloLens 2 application is developed.
arXiv Detail & Related papers (2021-03-31T06:59:13Z) - One to Many: Adaptive Instrument Segmentation via Meta Learning and
Dynamic Online Adaptation in Robotic Surgical Video [71.43912903508765]
MDAL is a dynamic online adaptive learning scheme for instrument segmentation in robot-assisted surgery.
It learns the general knowledge of instruments and the fast adaptation ability through the video-specific meta-learning paradigm.
It outperforms other state-of-the-art methods on two datasets.
arXiv Detail & Related papers (2021-03-24T05:02:18Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Real-Time Instrument Segmentation in Robotic Surgery using Auxiliary
Supervised Deep Adversarial Learning [15.490603884631764]
Real-time semantic segmentation of the robotic instruments and tissues is a crucial step in robot-assisted surgery.
We have developed a light-weight cascaded convolutional neural network (CNN) to segment the surgical instruments from high-resolution videos.
We show that our model surpasses existing algorithms for pixel-wise segmentation of surgical instruments in both prediction accuracy and segmentation time of high-resolution videos.
arXiv Detail & Related papers (2020-07-22T10:16:07Z) - Automatic Operating Room Surgical Activity Recognition for
Robot-Assisted Surgery [1.1033115844630357]
We investigate automatic surgical activity recognition in robot-assisted operations.
We collect the first large-scale dataset including 400 full-length multi-perspective videos.
We densely annotate the videos with 10 most recognized and clinically relevant classes of activities.
arXiv Detail & Related papers (2020-06-29T16:30:31Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z) - Automatic Gesture Recognition in Robot-assisted Surgery with
Reinforcement Learning and Tree Search [63.07088785532908]
We propose a framework based on reinforcement learning and tree search for joint surgical gesture segmentation and classification.
Our framework consistently outperforms the existing methods on the suturing task of JIGSAWS dataset in terms of accuracy, edit score and F1 score.
arXiv Detail & Related papers (2020-02-20T13:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.