VISAGE: Video Synthesis using Action Graphs for Surgery
- URL: http://arxiv.org/abs/2410.17751v2
- Date: Wed, 30 Oct 2024 10:13:18 GMT
- Title: VISAGE: Video Synthesis using Action Graphs for Surgery
- Authors: Yousef Yeganeh, Rachmadio Lazuardi, Amir Shamseddin, Emine Dari, Yash Thirani, Nassir Navab, Azade Farshad,
- Abstract summary: We introduce the novel task of future video generation in laparoscopic surgery.
Our proposed method, VISAGE, leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures.
Results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures.
- Score: 34.21344214645662
- License:
- Abstract: Surgical data science (SDS) is a field that analyzes patient data before, during, and after surgery to improve surgical outcomes and skills. However, surgical data is scarce, heterogeneous, and complex, which limits the applicability of existing machine learning methods. In this work, we introduce the novel task of future video generation in laparoscopic surgery. This task can augment and enrich the existing surgical data and enable various applications, such as simulation, analysis, and robot-aided surgery. Ultimately, it involves not only understanding the current state of the operation but also accurately predicting the dynamic and often unpredictable nature of surgical procedures. Our proposed method, VISAGE (VIdeo Synthesis using Action Graphs for Surgery), leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures and utilizes diffusion models to synthesize temporally coherent video sequences. VISAGE predicts the future frames given only a single initial frame, and the action graph triplets. By incorporating domain-specific knowledge through the action graph, VISAGE ensures the generated videos adhere to the expected visual and motion patterns observed in real laparoscopic procedures. The results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures, which enables various applications in SDS.
Related papers
- Thoracic Surgery Video Analysis for Surgical Phase Recognition [0.08706730566331035]
We analyse and evaluate both frame-based and video clipping-based phase recognition on thoracic surgery dataset consisting of 11 classes of phases.
We show that Masked Video Distillation(MVD) exhibits superior performance, achieving a top-1 accuracy of 72.9%, compared to 52.31% achieved by ImageNet ViT.
arXiv Detail & Related papers (2024-06-13T14:47:57Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - Hypergraph-Transformer (HGT) for Interactive Event Prediction in
Laparoscopic and Robotic Surgery [50.3022015601057]
We propose a predictive neural network that is capable of understanding and predicting critical interactive aspects of surgical workflow from intra-abdominal video.
We verify our approach on established surgical datasets and applications, including the detection and prediction of action triplets.
Our results demonstrate the superiority of our approach compared to unstructured alternatives.
arXiv Detail & Related papers (2024-02-03T00:58:05Z) - AiAReSeg: Catheter Detection and Segmentation in Interventional
Ultrasound using Transformers [75.20925220246689]
endovascular surgeries are performed using the golden standard of Fluoroscopy, which uses ionising radiation to visualise catheters and vasculature.
This work proposes a solution using an adaptation of a state-of-the-art machine learning transformer architecture to detect and segment catheters in axial interventional Ultrasound image sequences.
arXiv Detail & Related papers (2023-09-25T19:34:12Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - GLSFormer : Gated - Long, Short Sequence Transformer for Step
Recognition in Surgical Videos [57.93194315839009]
We propose a vision transformer-based approach to learn temporal features directly from sequence-level patches.
We extensively evaluate our approach on two cataract surgery video datasets, Cataract-101 and D99, and demonstrate superior performance compared to various state-of-the-art methods.
arXiv Detail & Related papers (2023-07-20T17:57:04Z) - AutoLaparo: A New Dataset of Integrated Multi-tasks for Image-guided
Surgical Automation in Laparoscopic Hysterectomy [42.20922574566824]
We present and release the first integrated dataset with multiple image-based perception tasks to facilitate learning-based automation in hysterectomy surgery.
Our AutoLaparo dataset is developed based on full-length videos of entire hysterectomy procedures.
Specifically, three different yet highly correlated tasks are formulated in the dataset, including surgical workflow recognition, laparoscope motion prediction, and instrument and key anatomy segmentation.
arXiv Detail & Related papers (2022-08-03T13:17:23Z) - Know your sensORs $\unicode{x2013}$ A Modality Study For Surgical Action
Classification [39.546197658791]
The medical community seeks to leverage this wealth of data to develop automated methods to advance interventional care, lower costs, and improve patient outcomes.
Existing datasets from OR room cameras are thus far limited in size or modalities acquired, leaving it unclear which sensor modalities are best suited for tasks such as recognizing surgical action from videos.
This study demonstrates that surgical action recognition performance can vary depending on the image modalities used.
arXiv Detail & Related papers (2022-03-16T15:01:17Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Recurrent and Spiking Modeling of Sparse Surgical Kinematics [0.8458020117487898]
A growing number of studies have used machine learning to analyze video and kinematic data captured from surgical robots.
In this study, we explore the possibility of using only kinematic data to predict surgeons of similar skill levels.
We report that it is possible to identify surgical fellows receiving near perfect scores in the simulation exercises based on their motion characteristics alone.
arXiv Detail & Related papers (2020-05-12T15:41:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.