Temporal Segmentation of Surgical Sub-tasks through Deep Learning with
Multiple Data Sources
- URL: http://arxiv.org/abs/2002.02921v1
- Date: Fri, 7 Feb 2020 17:49:08 GMT
- Title: Temporal Segmentation of Surgical Sub-tasks through Deep Learning with
Multiple Data Sources
- Authors: Yidan Qin, Sahba Aghajani Pedram, Seyedshams Feyzabadi, Max Allan, A.
Jonathan McLeod, Joel W. Burdick, Mahdi Azizian
- Abstract summary: We propose a unified surgical state estimation model based on the actions performed or events occurred as the task progresses.
We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) and a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging.
Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models.
- Score: 14.677001578868872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many tasks in robot-assisted surgeries (RAS) can be represented by
finite-state machines (FSMs), where each state represents either an action
(such as picking up a needle) or an observation (such as bleeding). A crucial
step towards the automation of such surgical tasks is the temporal perception
of the current surgical scene, which requires a real-time estimation of the
states in the FSMs. The objective of this work is to estimate the current state
of the surgical task based on the actions performed or events occurred as the
task progresses. We propose Fusion-KVE, a unified surgical state estimation
model that incorporates multiple data sources including the Kinematics, Vision,
and system Events. Additionally, we examine the strengths and weaknesses of
different state estimation models in segmenting states with different
representative features or levels of granularity. We evaluate our model on the
JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS), as well as a more
complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging,
created using the da Vinci Xi surgical system. Our model achieves a superior
frame-wise state estimation accuracy up to 89.4%, which improves the
state-of-the-art surgical state estimation models in both JIGSAWS suturing
dataset and our RIOUS dataset.
Related papers
- Realistic Data Generation for 6D Pose Estimation of Surgical Instruments [4.226502078427161]
6D pose estimation of surgical instruments is critical to enable the automatic execution of surgical maneuvers.
In household and industrial settings, synthetic data, generated with 3D computer graphics software, has been shown as an alternative to minimize annotation costs.
We propose an improved simulation environment for surgical robotics that enables the automatic generation of large and diverse datasets.
arXiv Detail & Related papers (2024-06-11T14:59:29Z) - BEHAVIOR Vision Suite: Customizable Dataset Generation via Simulation [57.40024206484446]
We introduce the BEHAVIOR Vision Suite (BVS), a set of tools and assets to generate fully customized synthetic data for systematic evaluation of computer vision models.
BVS supports a large number of adjustable parameters at the scene level.
We showcase three example application scenarios.
arXiv Detail & Related papers (2024-05-15T17:57:56Z) - Robotic Navigation Autonomy for Subretinal Injection via Intelligent
Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection.
Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target.
Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z) - ProcTHOR: Large-Scale Embodied AI Using Procedural Generation [55.485985317538194]
ProcTHOR is a framework for procedural generation of Embodied AI environments.
We demonstrate state-of-the-art results across 6 embodied AI benchmarks for navigation, rearrangement, and arm manipulation.
arXiv Detail & Related papers (2022-06-14T17:09:35Z) - CholecTriplet2021: A benchmark challenge for surgical action triplet
recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos.
We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge.
A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Learning Invariant Representation of Tasks for Robust Surgical State
Estimation [39.515036686428836]
We propose StiseNet, a Surgical Task Invariance State Estimation Network.
StiseNet minimizes the effects of variations in surgical technique and operating environments inherent to RAS datasets.
It is shown to outperform state-of-the-art state estimation methods on three datasets.
arXiv Detail & Related papers (2021-02-18T02:32:50Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - daVinciNet: Joint Prediction of Motion and Surgical State in
Robot-Assisted Surgery [13.928484202934651]
We propose daVinciNet - an end-to-end dual-task model for robot motion and surgical state predictions.
Our model achieves up to 93.85% short-term (0.5s) and 82.11% long-term (2s) state prediction accuracy, as well as 1.07mm short-term and 5.62mm long-term trajectory prediction error.
arXiv Detail & Related papers (2020-09-24T20:28:06Z) - Aggregating Long-Term Context for Learning Laparoscopic and
Robot-Assisted Surgical Workflows [40.48632897750319]
We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics.
We demonstrate superior results over existing and novel state-of-the-art segmentation techniques on two laparoscopic cholecystectomy datasets.
arXiv Detail & Related papers (2020-09-01T20:29:14Z) - Multi-Task Recurrent Neural Network for Surgical Gesture Recognition and
Progress Prediction [17.63619129438996]
We propose a multi-task recurrent neural network for simultaneous recognition of surgical gestures and estimation of a novel formulation of surgical task progress.
We demonstrate that recognition performance improves in multi-task frameworks with progress estimation without any additional manual labelling and training.
arXiv Detail & Related papers (2020-03-10T14:28:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.