Related papers: Adaptive Graph Learning from Spatial Information for Surgical Workflow Anticipation

Adaptive Graph Learning from Spatial Information for Surgical Workflow Anticipation

URL: http://arxiv.org/abs/2412.06454v1
Date: Mon, 09 Dec 2024 12:53:08 GMT
Title: Adaptive Graph Learning from Spatial Information for Surgical Workflow Anticipation
Authors: Francis Xiatian Zhang, Jingjing Deng, Robert Lieck, Hubert P. H. Shum,
Abstract summary: We propose an adaptive graph learning framework for surgical workflow anticipation based on a novel spatial representation.<n>We develop a multi-horizon objective that balances learning objectives for different time horizons, allowing for unconstrained predictions.
Score: 9.329654505950199
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Surgical workflow anticipation is the task of predicting the timing of relevant surgical events from live video data, which is critical in Robotic-Assisted Surgery (RAS). Accurate predictions require the use of spatial information to model surgical interactions. However, current methods focus solely on surgical instruments, assume static interactions between instruments, and only anticipate surgical events within a fixed time horizon. To address these challenges, we propose an adaptive graph learning framework for surgical workflow anticipation based on a novel spatial representation, featuring three key innovations. First, we introduce a new representation of spatial information based on bounding boxes of surgical instruments and targets, including their detection confidence levels. These are trained on additional annotations we provide for two benchmark datasets. Second, we design an adaptive graph learning method to capture dynamic interactions. Third, we develop a multi-horizon objective that balances learning objectives for different time horizons, allowing for unconstrained predictions. Evaluations on two benchmarks reveal superior performance in short-to-mid-term anticipation, with an error reduction of approximately 3% for surgical phase anticipation and 9% for remaining surgical duration anticipation. These performance improvements demonstrate the effectiveness of our method and highlight its potential for enhancing preparation and coordination within the RAS team. This can improve surgical safety and the efficiency of operating room usage.

Related papers

EndoARSS: Adapting Spatially-Aware Foundation Model for Efficient Activity Recognition and Semantic Segmentation in Endoscopic Surgery [11.286605039002419]
Endoscopic surgery is the gold standard for robotic-assisted minimally invasive surgery.<n>Traditional deep learning models often struggle with cross-activity interference, leading to suboptimal performance in each downstream task.<n>We propose EndoARSS, a novel multi-task learning framework specifically designed for endoscopy surgery activity recognition and semantic segmentation.
arXiv Detail & Related papers (2025-06-07T15:18:43Z)
Multi-Modal Self-Supervised Learning for Surgical Feedback Effectiveness Assessment [66.6041949490137]
We propose a method that integrates information from transcribed verbal feedback and corresponding surgical video to predict feedback effectiveness. Our findings show that both transcribed feedback and surgical video are individually predictive of trainee behavior changes. Our results demonstrate the potential of multi-modal learning to advance the automated assessment of surgical feedback.
arXiv Detail & Related papers (2024-11-17T00:13:00Z)
Hypergraph-Transformer (HGT) for Interactive Event Prediction in Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video. We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets. Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z)
Jumpstarting Surgical Computer Vision [2.7396997668655163]
We employ self-supervised learning to flexibly leverage diverse surgical datasets. We study phase recognition and critical view of safety in laparoscopic cholecystectomy and laparoscopic hysterectomy. The composition of pre-training datasets can severely affect the effectiveness of SSL methods for various downstream tasks.
arXiv Detail & Related papers (2023-12-10T18:54:16Z)
Next-generation Surgical Navigation: Marker-less Multi-view 6DoF Pose Estimation of Surgical Instruments [66.74633676595889]
We present a multi-camera capture setup consisting of static and head-mounted cameras. Second, we publish a multi-view RGB-D video dataset of ex-vivo spine surgeries, captured in a surgical wet lab and a real operating theatre. Third, we evaluate three state-of-the-art single-view and multi-view methods for the task of 6DoF pose estimation of surgical instruments.
arXiv Detail & Related papers (2023-05-05T13:42:19Z)
Using Hand Pose Estimation To Automate Open Surgery Training Feedback [0.0]
This research aims to facilitate the use of state-of-the-art computer vision algorithms for the automated training of surgeons. By estimating 2D hand poses, we model the movement of the practitioner's hands, and their interaction with surgical instruments.
arXiv Detail & Related papers (2022-11-13T21:47:31Z)
Towards Graph Representation Learning Based Surgical Workflow Anticipation [15.525314212209562]
We propose a graph representation learning framework to represent instrument motions in the surgical workflow anticipation problem. In our proposed graph representation, we maps the bounding box information of instruments to the graph nodes in the consecutive frames. We also build inter-frame/inter-instrument graph edges to represent the trajectory and interaction of the instruments over time.
arXiv Detail & Related papers (2022-08-07T21:28:22Z)
CholecTriplet2021: A benchmark challenge for surgical action triplet recognition [66.51610049869393]
This paper presents CholecTriplet 2021: an endoscopic vision challenge organized at MICCAI 2021 for the recognition of surgical action triplets in laparoscopic videos. We present the challenge setup and assessment of the state-of-the-art deep learning methods proposed by the participants during the challenge. A total of 4 baseline methods and 19 new deep learning algorithms are presented to recognize surgical action triplets directly from surgical videos, achieving mean average precision (mAP) ranging from 4.2% to 38.1%.
arXiv Detail & Related papers (2022-04-10T18:51:55Z)
Real-time Informative Surgical Skill Assessment with Gaussian Process Learning [12.019641896240245]
This work presents a novel Gaussian Process Learning-based automatic objective surgical skill assessment method for ESSBSs. The proposed method projects the instrument movements into the endoscope coordinate to reduce the data dimensionality. The experimental results show that the proposed method reaches 100% prediction precision for complete surgical procedures and 90% precision for real-time prediction assessment.
arXiv Detail & Related papers (2021-12-05T15:35:40Z)
Real-time landmark detection for precise endoscopic submucosal dissection via shape-aware relation network [51.44506007844284]
We propose a shape-aware relation network for accurate and real-time landmark detection in endoscopic submucosal dissection surgery. We first devise an algorithm to automatically generate relation keypoint heatmaps, which intuitively represent the prior knowledge of spatial relations among landmarks. We then develop two complementary regularization schemes to progressively incorporate the prior knowledge into the training process.
arXiv Detail & Related papers (2021-11-08T07:57:30Z)
Robust Medical Instrument Segmentation Challenge 2019 [56.148440125599905]
Intraoperative tracking of laparoscopic instruments is often a prerequisite for computer and robotic-assisted interventions. Our challenge was based on a surgical data set comprising 10,040 annotated images acquired from a total of 30 surgical procedures. The results confirm the initial hypothesis, namely that algorithm performance degrades with an increasing domain gap.
arXiv Detail & Related papers (2020-03-23T14:35:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.