Towards Graph Representation Learning Based Surgical Workflow
Anticipation
- URL: http://arxiv.org/abs/2208.03824v1
- Date: Sun, 7 Aug 2022 21:28:22 GMT
- Title: Towards Graph Representation Learning Based Surgical Workflow
Anticipation
- Authors: Xiatian Zhang, Noura Al Moubayed, Hubert P. H. Shum
- Abstract summary: We propose a graph representation learning framework to represent instrument motions in the surgical workflow anticipation problem.
In our proposed graph representation, we maps the bounding box information of instruments to the graph nodes in the consecutive frames.
We also build inter-frame/inter-instrument graph edges to represent the trajectory and interaction of the instruments over time.
- Score: 15.525314212209562
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical workflow anticipation can give predictions on what steps to conduct
or what instruments to use next, which is an essential part of the
computer-assisted intervention system for surgery, e.g. workflow reasoning in
robotic surgery. However, current approaches are limited to their insufficient
expressive power for relationships between instruments. Hence, we propose a
graph representation learning framework to comprehensively represent instrument
motions in the surgical workflow anticipation problem. In our proposed graph
representation, we maps the bounding box information of instruments to the
graph nodes in the consecutive frames and build inter-frame/inter-instrument
graph edges to represent the trajectory and interaction of the instruments over
time. This design enhances the ability of our network on modeling both the
spatial and temporal patterns of surgical instruments and their interactions.
In addition, we design a multi-horizon learning strategy to balance the
understanding of various horizons indifferent anticipation tasks, which
significantly improves the model performance in anticipation with various
horizons. Experiments on the Cholec80 dataset demonstrate the performance of
our proposed method can exceed the state-of-the-art method based on richer
backbones, especially in instrument anticipation (1.27 v.s. 1.48 for inMAE;
1.48 v.s. 2.68 for eMAE). To the best of our knowledge, we are the first to
introduce a spatial-temporal graph representation into surgical workflow
anticipation.
Related papers
- Adaptive Graph Learning from Spatial Information for Surgical Workflow Anticipation [9.329654505950199]
We propose an adaptive graph learning framework for surgical workflow anticipation based on a novel spatial representation.
We develop a multi-horizon objective that balances learning objectives for different time horizons, allowing for unconstrained predictions.
arXiv Detail & Related papers (2024-12-09T12:53:08Z) - Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - Encoding Surgical Videos as Latent Spatiotemporal Graphs for Object and
Anatomy-Driven Reasoning [2.9724186623561435]
We use latent graphs to represent a surgical video in terms of the constituent anatomical structures and tools over time.
We introduce a novel graph-editing module that incorporates prior knowledge temporal coherence to correct errors in the graph.
arXiv Detail & Related papers (2023-12-11T20:42:27Z) - SimTeG: A Frustratingly Simple Approach Improves Textual Graph Learning [131.04781590452308]
We present SimTeG, a frustratingly Simple approach for Textual Graph learning.
We first perform supervised parameter-efficient fine-tuning (PEFT) on a pre-trained LM on the downstream task.
We then generate node embeddings using the last hidden states of finetuned LM.
arXiv Detail & Related papers (2023-08-03T07:00:04Z) - LABRAD-OR: Lightweight Memory Scene Graphs for Accurate Bimodal
Reasoning in Dynamic Operating Rooms [39.11134330259464]
holistic modeling of the operating room (OR) is a challenging but essential task.
We introduce memory scene graphs, where the scene graphs of previous time steps act as the temporal representation guiding the current prediction.
We design an end-to-end architecture that intelligently fuses the temporal information of our lightweight memory scene graphs with the visual information from point clouds and images.
arXiv Detail & Related papers (2023-03-23T14:26:16Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Learning and Reasoning with the Graph Structure Representation in
Robotic Surgery [15.490603884631764]
Learning to infer graph representations can play a vital role in surgical scene understanding in robotic surgery.
We develop an approach to generate the scene graph and predict surgical interactions between instruments and surgical region of interest.
arXiv Detail & Related papers (2020-07-07T11:49:34Z) - Learning Motion Flows for Semi-supervised Instrument Segmentation from
Robotic Surgical Video [64.44583693846751]
We study the semi-supervised instrument segmentation from robotic surgical videos with sparse annotations.
By exploiting generated data pairs, our framework can recover and even enhance temporal consistency of training sequences.
Results show that our method outperforms the state-of-the-art semisupervised methods by a large margin.
arXiv Detail & Related papers (2020-07-06T02:39:32Z) - Graph Representation Learning via Graphical Mutual Information
Maximization [86.32278001019854]
We propose a novel concept, Graphical Mutual Information (GMI), to measure the correlation between input graphs and high-level hidden representations.
We develop an unsupervised learning model trained by maximizing GMI between the input and output of a graph neural encoder.
arXiv Detail & Related papers (2020-02-04T08:33:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.