ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes
- URL: http://arxiv.org/abs/2512.14092v1
- Date: Tue, 16 Dec 2025 04:59:58 GMT
- Title: ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes
- Authors: Felix Holm, Ghazal Ghazaei, Nassir Navab,
- Abstract summary: ProtoFlow is a novel framework that learns dynamic graph prototypes to model complex surgical events.<n>We evaluate our approach on the fine-grained CAT-SG dataset.
- Score: 42.15644075070622
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Purpose: Detailed surgical recognition is critical for advancing AI-assisted surgery, yet progress is hampered by high annotation costs, data scarcity, and a lack of interpretable models. While scene graphs offer a structured abstraction of surgical events, their full potential remains untapped. In this work, we introduce ProtoFlow, a novel framework that learns dynamic scene graph prototypes to model complex surgical workflows in an interpretable and robust manner. Methods: ProtoFlow leverages a graph neural network (GNN) encoder-decoder architecture that combines self-supervised pretraining for rich representation learning with a prototype-based fine-tuning stage. This process discovers and refines core prototypes that encapsulate recurring, clinically meaningful patterns of surgical interaction, forming an explainable foundation for workflow analysis. Results: We evaluate our approach on the fine-grained CAT-SG dataset. ProtoFlow not only outperforms standard GNN baselines in overall accuracy but also demonstrates exceptional robustness in limited-data, few-shot scenarios, maintaining strong performance when trained on as few as one surgical video. Our qualitative analyses further show that the learned prototypes successfully identify distinct surgical sub-techniques and provide clear, interpretable insights into workflow deviations and rare complications. Conclusion: By uniting robust representation learning with inherent explainability, ProtoFlow represents a significant step toward developing more transparent, reliable, and data-efficient AI systems, accelerating their potential for clinical adoption in surgical training, real-time decision support, and workflow optimization.
Related papers
- Med-CRAFT: Automated Construction of Interpretable and Multi-Hop Video Workloads via Knowledge Graph Traversal [13.216513001286812]
textbfPipelineName is a novel neuro-symbolic data engineering framework.<n> Med-CRAFT extracts structured visual primitives from raw video streams and instantiates them into a dynamic Spatiotemporal Knowledge Graph.<n>We instantiate this pipeline to produce M3-Med-Auto, a large-scale medical video reasoning benchmark.
arXiv Detail & Related papers (2025-11-30T19:24:10Z) - Supervise Less, See More: Training-free Nuclear Instance Segmentation with Prototype-Guided Prompting [53.799446807827714]
We introduce SPROUT, a fully training- and annotation-free prompting framework for nuclear instance segmentation.<n> SPROUT leverages histology-informed priors to construct slide-specific reference prototypes.<n>The resulting foreground and background features are transformed into positive and negative point prompts, enabling the Segment Anything Model (SAM) to produce precise nuclear delineations.
arXiv Detail & Related papers (2025-11-25T05:58:33Z) - An Active Learning Pipeline for Biomedical Image Instance Segmentation with Minimal Human Intervention [4.805039406228118]
Deep learning models such as U-Net have set new benchmarks in segmentation performance.<n>nnU-Net requires a substantial amount of annotated data for cross-validation.<n>This work proposes a data-centric AI workflow that leverages active learning and pseudo-labeling.
arXiv Detail & Related papers (2025-11-06T21:07:26Z) - Interpretable Clinical Classification with Kolgomorov-Arnold Networks [70.72819760172744]
Kolmogorov-Arnold Networks (KANs) offer intrinsic interpretability through transparent, symbolic representations.<n>KANs support built-in patient-level insights, intuitive visualizations, and nearest-patient retrieval.<n>These results position KANs as a promising step toward trustworthy AI that clinicians can understand, audit, and act upon.
arXiv Detail & Related papers (2025-09-20T17:21:58Z) - Data-Efficient Learning for Generalizable Surgical Video Understanding [0.0]
This research aims to bridge gap between deep learning-based surgical video analysis in research and its real-world clinical environments.<n>I benchmarked state-of-the-art neural network architectures to identify the most effective designs for each task.<n>We developed semi-driven frameworks that improve model performance across tasks by leveraging large amounts of unlabeled surgical video.
arXiv Detail & Related papers (2025-08-13T22:00:23Z) - Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv Detail & Related papers (2025-05-16T14:02:24Z) - Rethinking model prototyping through the MedMNIST+ dataset collection [0.11999555634662634]
This work introduces a comprehensive benchmark for the MedMNIST+ dataset collection.<n>We reassess commonly used Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures across distinct medical datasets.<n>Our findings suggest that computationally efficient training schemes and modern foundation models offer viable alternatives to costly end-to-end training.
arXiv Detail & Related papers (2024-04-24T10:19:25Z) - Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training.
We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields.
Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z) - CorpusBrain: Pre-train a Generative Retrieval Model for
Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner.
We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z) - Aggregating Long-Term Context for Learning Laparoscopic and
Robot-Assisted Surgical Workflows [40.48632897750319]
We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics.
We demonstrate superior results over existing and novel state-of-the-art segmentation techniques on two laparoscopic cholecystectomy datasets.
arXiv Detail & Related papers (2020-09-01T20:29:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.