Dynamic Scene Graph Representation for Surgical Video
- URL: http://arxiv.org/abs/2309.14538v2
- Date: Tue, 24 Oct 2023 10:24:00 GMT
- Title: Dynamic Scene Graph Representation for Surgical Video
- Authors: Felix Holm, Ghazal Ghazaei, Tobias Czempiel, Ege \"Ozsoy, Stefan Saur,
Nassir Navab
- Abstract summary: We exploit scene graphs as a more holistic, semantically meaningful and human-readable way to represent surgical videos.
We create a scene graph dataset from semantic segmentations from the CaDIS and CATARACTS datasets.
We demonstrate the benefits of surgical scene graphs regarding the explainability and robustness of model decisions.
- Score: 37.22552586793163
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical videos captured from microscopic or endoscopic imaging devices are
rich but complex sources of information, depicting different tools and
anatomical structures utilized during an extended amount of time. Despite
containing crucial workflow information and being commonly recorded in many
procedures, usage of surgical videos for automated surgical workflow
understanding is still limited.
In this work, we exploit scene graphs as a more holistic, semantically
meaningful and human-readable way to represent surgical videos while encoding
all anatomical structures, tools, and their interactions. To properly evaluate
the impact of our solutions, we create a scene graph dataset from semantic
segmentations from the CaDIS and CATARACTS datasets. We demonstrate that scene
graphs can be leveraged through the use of graph convolutional networks (GCNs)
to tackle surgical downstream tasks such as surgical workflow recognition with
competitive performance. Moreover, we demonstrate the benefits of surgical
scene graphs regarding the explainability and robustness of model decisions,
which are crucial in the clinical setting.
Related papers
- OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining [55.15365161143354]
OphCLIP is a hierarchical retrieval-augmented vision-language pretraining framework for ophthalmic surgical workflow understanding.
OphCLIP learns both fine-grained and long-term visual representations by aligning short video clips with detailed narrative descriptions and full videos with structured titles.
Our OphCLIP also designs a retrieval-augmented pretraining framework to leverage the underexplored large-scale silent surgical procedure videos.
arXiv Detail & Related papers (2024-11-23T02:53:08Z) - VISAGE: Video Synthesis using Action Graphs for Surgery [34.21344214645662]
We introduce the novel task of future video generation in laparoscopic surgery.
Our proposed method, VISAGE, leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures.
Results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures.
arXiv Detail & Related papers (2024-10-23T10:28:17Z) - SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction [37.86132786212667]
We introduce an end-to-end framework for the generation and optimization of surgical scene graphs.
Our solution outperforms the SOTA on the CATARACTS dataset by 8% accuracy and 10% F1 score in surgical workflow.
arXiv Detail & Related papers (2024-07-29T17:44:34Z) - OphNet: A Large-Scale Video Benchmark for Ophthalmic Surgical Workflow Understanding [26.962250661485967]
OphNet is a large-scale, expert-annotated video benchmark for ophthalmic surgical workflow understanding.
A diverse collection of 2,278 surgical videos spanning 66 types of cataract, glaucoma, and corneal surgeries, with detailed annotations for 102 unique surgical phases and 150 fine-grained operations.
OphNet is about 20 times larger than the largest existing surgical workflow analysis benchmark.
arXiv Detail & Related papers (2024-06-11T17:18:11Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Towards Unsupervised Learning for Instrument Segmentation in Robotic
Surgery with Cycle-Consistent Adversarial Networks [54.00217496410142]
We propose an unpaired image-to-image translation where the goal is to learn the mapping between an input endoscopic image and a corresponding annotation.
Our approach allows to train image segmentation models without the need to acquire expensive annotations.
We test our proposed method on Endovis 2017 challenge dataset and show that it is competitive with supervised segmentation methods.
arXiv Detail & Related papers (2020-07-09T01:39:39Z) - Learning and Reasoning with the Graph Structure Representation in
Robotic Surgery [15.490603884631764]
Learning to infer graph representations can play a vital role in surgical scene understanding in robotic surgery.
We develop an approach to generate the scene graph and predict surgical interactions between instruments and surgical region of interest.
arXiv Detail & Related papers (2020-07-07T11:49:34Z) - LRTD: Long-Range Temporal Dependency based Active Learning for Surgical
Workflow Recognition [67.86810761677403]
We propose a novel active learning method for cost-effective surgical video analysis.
Specifically, we propose a non-local recurrent convolutional network (NL-RCNet), which introduces non-local block to capture the long-range temporal dependency.
We validate our approach on a large surgical video dataset (Cholec80) by performing surgical workflow recognition task.
arXiv Detail & Related papers (2020-04-21T09:21:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.