Decoding the Surgical Scene: A Scoping Review of Scene Graphs in Surgery
- URL: http://arxiv.org/abs/2509.20941v1
- Date: Thu, 25 Sep 2025 09:25:46 GMT
- Title: Decoding the Surgical Scene: A Scoping Review of Scene Graphs in Surgery
- Authors: Angelo Henriques, Korab Hoxha, Daniel Zapp, Peter C. Issa, Nassir Navab, M. Ali Nasseri,
- Abstract summary: Scene graphs (SGs) provide structured representations crucial for decoding complex, dynamic surgical environments.<n>This PRISMA-ScR-guided scoping review systematically maps the evolving landscape of SG research in surgery.<n>Our analysis reveals rapid growth, yet uncovers a critical 'data divide'<n>SGs are maturing into an essential semantic bridge, enabling a new generation of intelligent systems to improve surgical safety, efficiency, and training.
- Score: 36.192962258966105
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Scene graphs (SGs) provide structured relational representations crucial for decoding complex, dynamic surgical environments. This PRISMA-ScR-guided scoping review systematically maps the evolving landscape of SG research in surgery, charting its applications, methodological advancements, and future directions. Our analysis reveals rapid growth, yet uncovers a critical 'data divide': internal-view research (e.g., triplet recognition) almost exclusively uses real-world 2D video, while external-view 4D modeling relies heavily on simulated data, exposing a key translational research gap. Methodologically, the field has advanced from foundational graph neural networks to specialized foundation models that now significantly outperform generalist large vision-language models in surgical contexts. This progress has established SGs as a cornerstone technology for both analysis, such as workflow recognition and automated safety monitoring, and generative tasks like controllable surgical simulation. Although challenges in data annotation and real-time implementation persist, they are actively being addressed through emerging techniques. Surgical SGs are maturing into an essential semantic bridge, enabling a new generation of intelligent systems to improve surgical safety, efficiency, and training.
Related papers
- How Far Are Surgeons from Surgical World Models? A Pilot Study on Zero-shot Surgical Video Generation with Expert Assessment [69.13598421861654]
We present SurgVeo, the first expert-curated benchmark for video generation model evaluation in surgery.<n>We task the advanced Veo-3 model with a zero-shot prediction task on surgical clips from laparoscopic and neurosurgical procedures.<n>Our results reveal a distinct "plausibility gap": while Veo-3 achieves exceptional Visual Perceptual Plausibility, it fails critically at higher levels of the Surgical Plausibility Pyramid.
arXiv Detail & Related papers (2025-11-03T17:28:54Z) - Data-Efficient Learning for Generalizable Surgical Video Understanding [0.0]
This research aims to bridge gap between deep learning-based surgical video analysis in research and its real-world clinical environments.<n>I benchmarked state-of-the-art neural network architectures to identify the most effective designs for each task.<n>We developed semi-driven frameworks that improve model performance across tasks by leveraging large amounts of unlabeled surgical video.
arXiv Detail & Related papers (2025-08-13T22:00:23Z) - SurgTPGS: Semantic 3D Surgical Scene Understanding with Text Promptable Gaussian Splatting [45.16104996137126]
We present SurgTPGS, a novel text-promptable Gaussian Splatting method to fill this gap.<n>We propose semantic-aware deformation tracking to capture the seamless deformation of semantic features, providing a more precise reconstruction for both texture and semantic features.<n>We conduct comprehensive experiments on two real-world surgical datasets to demonstrate the superiority of SurgTPGS over state-of-the-art methods.
arXiv Detail & Related papers (2025-06-29T15:55:01Z) - CAT-SG: A Large Dynamic Scene Graph Dataset for Fine-Grained Understanding of Cataract Surgery [38.59801869721841]
This paper introduces the Cataract Surgery Scene Graph (CAT-SG) dataset, the first to provide structured annotations of tool-tissue interactions, procedural variations, and temporal variations.<n>By incorporating detailed semantic relations, CAT-SG offers a holistic view of surgical dependencies, enabling more accurate recognition of surgical phases and techniques.
arXiv Detail & Related papers (2025-06-26T23:25:23Z) - Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv Detail & Related papers (2025-05-16T14:02:24Z) - Surgeons vs. Computer Vision: A comparative analysis on surgical phase recognition capabilities [65.66373425605278]
Automated Surgical Phase Recognition (SPR) uses Artificial Intelligence (AI) to segment the surgical workflow into its key events.<n>Previous research has focused on short and linear surgical procedures and has not explored if temporal context influences experts' ability to better classify surgical phases.<n>This research addresses these gaps, focusing on Robot-Assisted Partial Nephrectomy (RAPN) as a highly non-linear procedure.
arXiv Detail & Related papers (2025-04-26T15:37:22Z) - VISAGE: Video Synthesis using Action Graphs for Surgery [34.21344214645662]
We introduce the novel task of future video generation in laparoscopic surgery.
Our proposed method, VISAGE, leverages the power of action scene graphs to capture the sequential nature of laparoscopic procedures.
Results of our experiments demonstrate high-fidelity video generation for laparoscopy procedures.
arXiv Detail & Related papers (2024-10-23T10:28:17Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.