Automatic Operating Room Surgical Activity Recognition for
Robot-Assisted Surgery
- URL: http://arxiv.org/abs/2006.16166v1
- Date: Mon, 29 Jun 2020 16:30:31 GMT
- Title: Automatic Operating Room Surgical Activity Recognition for
Robot-Assisted Surgery
- Authors: Aidean Sharghi, Helene Haugerud, Daniel Oh, Omid Mohareri
- Abstract summary: We investigate automatic surgical activity recognition in robot-assisted operations.
We collect the first large-scale dataset including 400 full-length multi-perspective videos.
We densely annotate the videos with 10 most recognized and clinically relevant classes of activities.
- Score: 1.1033115844630357
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Automatic recognition of surgical activities in the operating room (OR) is a
key technology for creating next generation intelligent surgical devices and
workflow monitoring/support systems. Such systems can potentially enhance
efficiency in the OR, resulting in lower costs and improved care delivery to
the patients. In this paper, we investigate automatic surgical activity
recognition in robot-assisted operations. We collect the first large-scale
dataset including 400 full-length multi-perspective videos from a variety of
robotic surgery cases captured using Time-of-Flight cameras. We densely
annotate the videos with 10 most recognized and clinically relevant classes of
activities. Furthermore, we investigate state-of-the-art computer vision action
recognition techniques and adapt them for the OR environment and the dataset.
First, we fine-tune the Inflated 3D ConvNet (I3D) for clip-level activity
recognition on our dataset and use it to extract features from the videos.
These features are then fed to a stack of 3 Temporal Gaussian Mixture layers
which extracts context from neighboring clips, and eventually go through a Long
Short Term Memory network to learn the order of activities in full-length
videos. We extensively assess the model and reach a peak performance of 88%
mean Average Precision.
Related papers
- Thoracic Surgery Video Analysis for Surgical Phase Recognition [0.08706730566331035]
We analyse and evaluate both frame-based and video clipping-based phase recognition on thoracic surgery dataset consisting of 11 classes of phases.
We show that Masked Video Distillation(MVD) exhibits superior performance, achieving a top-1 accuracy of 72.9%, compared to 52.31% achieved by ImageNet ViT.
arXiv Detail & Related papers (2024-06-13T14:47:57Z) - Creating a Digital Twin of Spinal Surgery: A Proof of Concept [68.37190859183663]
Surgery digitalization is the process of creating a virtual replica of real-world surgery.
We present a proof of concept (PoC) for surgery digitalization that is applied to an ex-vivo spinal surgery.
We employ five RGB-D cameras for dynamic 3D reconstruction of the surgeon, a high-end camera for 3D reconstruction of the anatomy, an infrared stereo camera for surgical instrument tracking, and a laser scanner for 3D reconstruction of the operating room and data fusion.
arXiv Detail & Related papers (2024-03-25T13:09:40Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Surgical tool classification and localization: results and methods from
the MICCAI 2022 SurgToolLoc challenge [69.91670788430162]
We present the results of the SurgLoc 2022 challenge.
The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools.
We conclude by discussing these results in the broader context of machine learning and surgical data science.
arXiv Detail & Related papers (2023-05-11T21:44:39Z) - AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided
Surgical Automation in Laparoscopic Hysterectomy [42.20922574566824]
We present and release the first integrated dataset with multiple image-based perception tasks to facilitate learning-based automation in hysterectomy surgery.
Our AutoLaparo dataset is developed based on full-length videos of entire hysterectomy procedures.
Specifically, three different yet highly correlated tasks are formulated in the dataset, including surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation.
arXiv Detail & Related papers (2022-08-03T13:17:23Z) - Adaptation of Surgical Activity Recognition Models Across Operating
Rooms [10.625208343893911]
We study the generalizability of surgical activity recognition models across operating rooms.
We propose a new domain adaptation method to improve the performance of the surgical activity recognition model.
arXiv Detail & Related papers (2022-07-07T04:41:34Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - Using Human Gaze For Surgical Activity Recognition [0.40611352512781856]
We propose to use human gaze with a spatial temporal attention mechanism for activity recognition in surgical videos.
Our model consists of an I3D-based architecture, learns temporal features using 3D convolutions, as well as learning an attention map using human gaze.
arXiv Detail & Related papers (2022-03-09T14:28:00Z) - Using Computer Vision to Automate Hand Detection and Tracking of Surgeon
Movements in Videos of Open Surgery [8.095095522269352]
We leverage advances in computer vision to introduce an automated approach to video analysis of surgical execution.
A state-of-the-art convolutional neural network architecture for object detection was used to detect operating hands in open surgery videos.
Our model's spatial detections of operating hands significantly outperforms the detections achieved using pre-existing hand-detection datasets.
arXiv Detail & Related papers (2020-12-13T03:10:09Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.