Related papers: Recognition and Prediction of Surgical Gestures and Trajectories Using Transformer Models in Robot-Assisted Surgery

Recognition and Prediction of Surgical Gestures and Trajectories Using Transformer Models in Robot-Assisted Surgery

URL: http://arxiv.org/abs/2212.01683v1
Date: Sat, 3 Dec 2022 20:26:48 GMT
Title: Recognition and Prediction of Surgical Gestures and Trajectories Using Transformer Models in Robot-Assisted Surgery
Authors: Chang Shi, Yi Zheng, Ann Majewicz Fey
Abstract summary: Transformer models were first developed for Natural Language Processing (NLP) to model word sequences. We propose the novel use of a Transformer model for three tasks: gesture recognition, gesture prediction, and trajectory prediction during RAS.
Score: 10.719885390990433
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Surgical activity recognition and prediction can help provide important context in many Robot-Assisted Surgery (RAS) applications, for example, surgical progress monitoring and estimation, surgical skill evaluation, and shared control strategies during teleoperation. Transformer models were first developed for Natural Language Processing (NLP) to model word sequences and soon the method gained popularity for general sequence modeling tasks. In this paper, we propose the novel use of a Transformer model for three tasks: gesture recognition, gesture prediction, and trajectory prediction during RAS. We modify the original Transformer architecture to be able to generate the current gesture sequence, future gesture sequence, and future trajectory sequence estimations using only the current kinematic data of the surgical robot end-effectors. We evaluate our proposed models on the JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS) and use Leave-One-User-Out (LOUO) cross-validation to ensure the generalizability of our results. Our models achieve up to 89.3\% gesture recognition accuracy, 84.6\% gesture prediction accuracy (1 second ahead) and 2.71mm trajectory prediction error (1 second ahead). Our models are comparable to and able to outperform state-of-the-art methods while using only the kinematic data channel. This approach can enable near-real time surgical activity recognition and prediction.

Related papers

FAST: Efficient Action Tokenization for Vision-Language-Action Models [98.15494168962563]
We propose a new compression-based tokenization scheme for robot actions, based on the discrete cosine transform. Based on FAST, we release FAST+, a universal robot action tokenizer, trained on 1M real robot action trajectories.
arXiv Detail & Related papers (2025-01-16T18:57:04Z)
Future Success Prediction in Open-Vocabulary Object Manipulation Tasks Based on End-Effector Trajectories [1.0742675209112622]
This study addresses a task designed to predict the future success or failure of openvocabulary object manipulation. In this task, the model is required to make predictions based on natural language instructions, egocentric view images before manipulation, and the given end-effector trajectories. We propose a novel approach that enables the prediction of success or failure by aligning the given trajectories and images with natural language instructions.
arXiv Detail & Related papers (2024-12-26T08:11:41Z)
Latent Action Pretraining from Videos [156.88613023078778]
We introduce Latent Action Pretraining for general Action models (LAPA) LAPA is an unsupervised method for pretraining Vision-Language-Action (VLA) models without ground-truth robot action labels. We propose a method to learn from internet-scale videos that do not have robot action labels.
arXiv Detail & Related papers (2024-10-15T16:28:09Z)
Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z)
CaRTS: Causality-driven Robot Tool Segmentation from Vision and Kinematics Data [11.92904350972493]
Vision-based segmentation of the robotic tool during robot-assisted surgery enables downstream applications, such as augmented reality feedback. With the introduction of deep learning, many methods were presented to solve instrument segmentation directly and solely from images. We present CaRTS, a causality-driven robot tool segmentation algorithm, that is designed based on a complementary causal model of the robot tool segmentation task.
arXiv Detail & Related papers (2022-03-15T22:26:19Z)
STAR: Sparse Transformer-based Action Recognition [61.490243467748314]
This work proposes a novel skeleton-based human action recognition model with sparse attention on the spatial dimension and segmented linear attention on the temporal dimension of data. Experiments show that our model can achieve comparable performance while utilizing much less trainable parameters and achieve high speed in training and inference.
arXiv Detail & Related papers (2021-07-15T02:53:11Z)
Future Frame Prediction for Robot-assisted Surgery [57.18185972461453]
We propose a ternary prior guided variational autoencoder (TPG-VAE) model for future frame prediction in robotic surgical video sequences. Besides content distribution, our model learns motion distribution, which is novel to handle the small movements of surgical tools.
arXiv Detail & Related papers (2021-03-18T15:12:06Z)
Domain Adaptive Robotic Gesture Recognition with Unsupervised Kinematic-Visual Data Alignment [60.31418655784291]
We propose a novel unsupervised domain adaptation framework which can simultaneously transfer multi-modality knowledge, i.e., both kinematic and visual data, from simulator to real robot. It remedies the domain gap with enhanced transferable features by using temporal cues in videos, and inherent correlations in multi-modal towards recognizing gesture. Results show that our approach recovers the performance with great improvement gains, up to 12.91% in ACC and 20.16% in F1score without using any annotations in real robot.
arXiv Detail & Related papers (2021-03-06T09:10:03Z)
daVinciNet: Joint Prediction of Motion and Surgical State in Robot-Assisted Surgery [13.928484202934651]
We propose daVinciNet - an end-to-end dual-task model for robot motion and surgical state predictions. Our model achieves up to 93.85% short-term (0.5s) and 82.11% long-term (2s) state prediction accuracy, as well as 1.07mm short-term and 5.62mm long-term trajectory prediction error.
arXiv Detail & Related papers (2020-09-24T20:28:06Z)
Predictive Modeling of Periodic Behavior for Human-Robot Symbiotic Walking [13.68799310875662]
We extend Interaction Primitives to periodic movement regimes, i.e., walking. We show that this model is particularly well-suited for learning data-driven, customized models of human walking. We also demonstrate how the same framework can be used to learn controllers for a robotic prosthesis.
arXiv Detail & Related papers (2020-05-27T03:30:48Z)
Learning Predictive Models From Observation and Interaction [137.77887825854768]
Learning predictive models from interaction with the world allows an agent, such as a robot, to learn about how the world works. However, learning a model that captures the dynamics of complex skills represents a major challenge. We propose a method to augment the training set with observational data of other agents, such as humans.
arXiv Detail & Related papers (2019-12-30T01:10:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.