daVinciNet: Joint Prediction of Motion and Surgical State in
Robot-Assisted Surgery
- URL: http://arxiv.org/abs/2009.11937v1
- Date: Thu, 24 Sep 2020 20:28:06 GMT
- Title: daVinciNet: Joint Prediction of Motion and Surgical State in
Robot-Assisted Surgery
- Authors: Yidan Qin, Seyedshams Feyzabadi, Max Allan, Joel W. Burdick, Mahdi
Azizian
- Abstract summary: We propose daVinciNet - an end-to-end dual-task model for robot motion and surgical state predictions.
Our model achieves up to 93.85% short-term (0.5s) and 82.11% long-term (2s) state prediction accuracy, as well as 1.07mm short-term and 5.62mm long-term trajectory prediction error.
- Score: 13.928484202934651
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a technique to concurrently and jointly predict the
future trajectories of surgical instruments and the future state(s) of surgical
subtasks in robot-assisted surgeries (RAS) using multiple input sources. Such
predictions are a necessary first step towards shared control and supervised
autonomy of surgical subtasks. Minute-long surgical subtasks, such as suturing
or ultrasound scanning, often have distinguishable tool kinematics and visual
features, and can be described as a series of fine-grained states with
transition schematics. We propose daVinciNet - an end-to-end dual-task model
for robot motion and surgical state predictions. daVinciNet performs concurrent
end-effector trajectory and surgical state predictions using features extracted
from multiple data streams, including robot kinematics, endoscopic vision, and
system events. We evaluate our proposed model on an extended Robotic
Intra-Operative Ultrasound (RIOUS+) imaging dataset collected on a da Vinci Xi
surgical system and the JHU-ISI Gesture and Skill Assessment Working Set
(JIGSAWS). Our model achieves up to 93.85% short-term (0.5s) and 82.11%
long-term (2s) state prediction accuracy, as well as 1.07mm short-term and
5.62mm long-term trajectory prediction error.
Related papers
- VISAGE: Video Synthesis using Action Graphs for Surgery [34.21344214645662]
We introduce the novel task of future video generation in laparoscopic surgery.
Our proposed method, VISAGE, leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures.
Results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures.
arXiv Detail & Related papers (2024-10-23T10:28:17Z) - Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics.
Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens.
We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z) - Robotic Navigation Autonomy for Subretinal Injection via Intelligent
Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection.
Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target.
Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z) - Recognition and Prediction of Surgical Gestures and Trajectories Using
Transformer Models in Robot-Assisted Surgery [10.719885390990433]
Transformer models were first developed for Natural Language Processing (NLP) to model word sequences.
We propose the novel use of a Transformer model for three tasks: gesture recognition, gesture prediction, and trajectory prediction during RAS.
arXiv Detail & Related papers (2022-12-03T20:26:48Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences.
Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z) - Temporal Segmentation of Surgical Sub-tasks through Deep Learning with
Multiple Data Sources [14.677001578868872]
We propose a unified surgical state estimation model based on the actions performed or events occurred as the task progresses.
We evaluate our model on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) and a more complex dataset involving robotic intra-operative ultrasound (RIOUS) imaging.
Our model achieves a superior frame-wise state estimation accuracy up to 89.4%, which improves the state-of-the-art surgical state estimation models.
arXiv Detail & Related papers (2020-02-07T17:49:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.