Towards Holistic Surgical Scene Graph
- URL: http://arxiv.org/abs/2507.15541v2
- Date: Thu, 24 Jul 2025 00:51:53 GMT
- Title: Towards Holistic Surgical Scene Graph
- Authors: Jongmin Shin, Enki Cho, Ka Young Kim, Jung Yong Kim, Seong Tae Kim, Namkee Oh,
- Abstract summary: Surgical scene understanding is crucial for computer-assisted intervention systems.<n>To represent the complex information in surgical scenes, graph-based approaches have been explored.<n>We propose Endoscapes-SG201 dataset, which includes annotations for tool-action-target combinations and hand identity.<n>We also introduce SSG-Com, a graph-based method designed to learn and represent these critical elements.
- Score: 2.6272547208243338
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Surgical scene understanding is crucial for computer-assisted intervention systems, requiring visual comprehension of surgical scenes that involves diverse elements such as surgical tools, anatomical structures, and their interactions. To effectively represent the complex information in surgical scenes, graph-based approaches have been explored to structurally model surgical entities and their relationships. Previous surgical scene graph studies have demonstrated the feasibility of representing surgical scenes using graphs. However, certain aspects of surgical scenes-such as diverse combinations of tool-action-target and the identity of the hand operating the tool-remain underexplored in graph-based representations, despite their importance. To incorporate these aspects into graph representations, we propose Endoscapes-SG201 dataset, which includes annotations for tool-action-target combinations and hand identity. We also introduce SSG-Com, a graph-based method designed to learn and represent these critical elements. Through experiments on downstream tasks such as critical view of safety assessment and action triplet recognition, we demonstrated the importance of integrating these essential scene graph components, highlighting their significant contribution to surgical scene understanding. The code and dataset are available at https://github.com/ailab-kyunghee/SSG-Com
Related papers
- Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv Detail & Related papers (2025-05-16T14:02:24Z) - Dynamic Scene Graph Representation for Surgical Video [37.22552586793163]
We exploit scene graphs as a more holistic, semantically meaningful and human-readable way to represent surgical videos.
We create a scene graph dataset from semantic segmentations from the CaDIS and CATARACTS datasets.
We demonstrate the benefits of surgical scene graphs regarding the explainability and robustness of model decisions.
arXiv Detail & Related papers (2023-09-25T21:28:14Z) - Visual-Kinematics Graph Learning for Procedure-agnostic Instrument Tip
Segmentation in Robotic Surgeries [29.201385352740555]
We propose a novel visual-kinematics graph learning framework to accurately segment the instrument tip given various surgical procedures.
Specifically, a graph learning framework is proposed to encode relational features of instrument parts from both image and kinematics.
A cross-modal contrastive loss is designed to incorporate robust geometric prior from kinematics to image for tip segmentation.
arXiv Detail & Related papers (2023-09-02T14:52:58Z) - SurGNN: Explainable visual scene understanding and assessment of
surgical skill using graph neural networks [19.57785997767885]
This paper explores how graph neural networks (GNNs) can be used to enhance visual scene understanding and surgical skill assessment.
GNNs provide interpretable results, revealing the specific actions, instruments, or anatomical structures that contribute to the predicted skill metrics.
arXiv Detail & Related papers (2023-08-24T20:32:57Z) - Latent Graph Representations for Critical View of Safety Assessment [2.9724186623561435]
We propose a method for CVS prediction wherein we first represent a surgical image using a disentangled latent scene graph, then process this representation using a graph neural network.
Our graph representations explicitly encode semantic information to improve anatomy-driven reasoning, as well as visual features to retain differentiability and thereby provide robustness to semantic errors.
We show that our method not only outperforms several baseline methods when trained with bounding box annotations, but also scales effectively when trained with segmentation masks, maintaining state-of-the-art performance.
arXiv Detail & Related papers (2022-12-08T09:21:09Z) - Learning Action-Effect Dynamics from Pairs of Scene-graphs [50.72283841720014]
We propose a novel method that leverages scene-graph representation of images to reason about the effects of actions described in natural language.
Our proposed approach is effective in terms of performance, data efficiency, and generalization capability compared to existing models.
arXiv Detail & Related papers (2022-12-07T03:36:37Z) - SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense
Reasoning [61.57887011165744]
multimodal Transformers have made great progress in the task of Visual Commonsense Reasoning.
We propose a Scene Graph Enhanced Image-Text Learning framework to incorporate visual scene graphs in commonsense reasoning.
arXiv Detail & Related papers (2021-12-16T03:16:30Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Learning and Reasoning with the Graph Structure Representation in
Robotic Surgery [15.490603884631764]
Learning to infer graph representations can play a vital role in surgical scene understanding in robotic surgery.
We develop an approach to generate the scene graph and predict surgical interactions between instruments and surgical region of interest.
arXiv Detail & Related papers (2020-07-07T11:49:34Z) - GoGNN: Graph of Graphs Neural Network for Predicting Structured Entity
Interactions [70.9481395807354]
We propose a Graph of Graphs Neural Network (GoGNN), which extracts the features in both structured entity graphs and the entity interaction graph in a hierarchical way.
GoGNN outperforms the state-of-the-art methods on two representative structured entity interaction prediction tasks.
arXiv Detail & Related papers (2020-05-12T03:46:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.