Aggregating Long-Term Context for Learning Laparoscopic and
Robot-Assisted Surgical Workflows
- URL: http://arxiv.org/abs/2009.00681v4
- Date: Mon, 10 May 2021 20:02:18 GMT
- Title: Aggregating Long-Term Context for Learning Laparoscopic and
Robot-Assisted Surgical Workflows
- Authors: Yutong Ban, Guy Rosman, Thomas Ward, Daniel Hashimoto, Taisei Kondo,
Hidekazu Iwaki, Ozanan Meireles, Daniela Rus
- Abstract summary: We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics.
We demonstrate superior results over existing and novel state-of-the-art segmentation techniques on two laparoscopic cholecystectomy datasets.
- Score: 40.48632897750319
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Analyzing surgical workflow is crucial for surgical assistance robots to
understand surgeries. With the understanding of the complete surgical workflow,
the robots are able to assist the surgeons in intra-operative events, such as
by giving a warning when the surgeon is entering specific keys or high-risk
phases. Deep learning techniques have recently been widely applied to
recognizing surgical workflows. Many of the existing temporal neural network
models are limited in their capability to handle long-term dependencies in the
data, instead, relying upon the strong performance of the underlying per-frame
visual models. We propose a new temporal network structure that leverages
task-specific network representation to collect long-term sufficient statistics
that are propagated by a sufficient statistics model (SSM). We implement our
approach within an LSTM backbone for the task of surgical phase recognition and
explore several choices for propagated statistics. We demonstrate superior
results over existing and novel state-of-the-art segmentation techniques on two
laparoscopic cholecystectomy datasets: the publicly available Cholec80 dataset
and MGH100, a novel dataset with more challenging and clinically meaningful
segment labels.
Related papers
- LLM-Assisted Multi-Teacher Continual Learning for Visual Question Answering in Robotic Surgery [57.358568111574314]
Patient data privacy often restricts the availability of old data when updating the model.
Prior CL studies overlooked two vital problems in the surgical domain.
This paper proposes addressing these problems with a multimodal large language model (LLM) and an adaptive weight assignment methodology.
arXiv Detail & Related papers (2024-02-26T15:35:24Z) - SAR-RARP50: Segmentation of surgical instrumentation and Action
Recognition on Robot-Assisted Radical Prostatectomy Challenge [72.97934765570069]
We release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP)
The aim of the challenge is to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain.
A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation.
arXiv Detail & Related papers (2023-12-31T13:32:18Z) - Identification of Cognitive Workload during Surgical Tasks with
Multimodal Deep Learning [20.706268332427157]
An increase in the associated Cognitive Workload (CWL) results from dealing with unexpected and repetitive tasks.
In this paper, a cascade of two machine learning approaches is suggested for the multimodal recognition of CWL.
A Convolutional Neural Network (CNN) uses this information to identify different types of CWL associated to each surgical task.
arXiv Detail & Related papers (2022-09-12T18:29:34Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Federated Cycling (FedCy): Semi-supervised Federated Learning of
Surgical Phases [57.90226879210227]
FedCy is a semi-supervised learning (FSSL) method that combines FL and self-supervised learning to exploit a decentralized dataset of both labeled and unlabeled videos.
We demonstrate significant performance gains over state-of-the-art FSSL methods on the task of automatic recognition of surgical phases.
arXiv Detail & Related papers (2022-03-14T17:44:53Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z) - TeCNO: Surgical Phase Recognition with Multi-Stage Temporal
Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition.
Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z) - Multi-Task Recurrent Neural Network for Surgical Gesture Recognition and
Progress Prediction [17.63619129438996]
We propose a multi-task recurrent neural network for simultaneous recognition of surgical gestures and estimation of a novel formulation of surgical task progress.
We demonstrate that recognition performance improves in multi-task frameworks with progress estimation without any additional manual labelling and training.
arXiv Detail & Related papers (2020-03-10T14:28:02Z) - Temporal Segmentation of Surgical Sub-tasks through Deep Learning with
Multiple Data Sources [14.677001578868872]
We propose a unified surgical state estimation model based on the actions performed or events occurred as the task progresses.
We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) and a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging.
Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models.
arXiv Detail & Related papers (2020-02-07T17:49:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.