Related papers: ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes

ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes

URL: http://arxiv.org/abs/2512.14092v1
Date: Tue, 16 Dec 2025 04:59:58 GMT
Title: ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes
Authors: Felix Holm, Ghazal Ghazaei, Nassir Navab,
Abstract summary: ProtoFlow is a novel framework that learns dynamic graph prototypes to model complex surgical events.<n>We evaluate our approach on the fine-grained CAT-SG dataset.
Score: 42.15644075070622
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Purpose: Detailed surgical recognition is critical for advancing AI-assisted surgery, yet progress is hampered by high annotation costs, data scarcity, and a lack of interpretable models. While scene graphs offer a structured abstraction of surgical events, their full potential remains untapped. In this work, we introduce ProtoFlow, a novel framework that learns dynamic scene graph prototypes to model complex surgical workflows in an interpretable and robust manner. Methods: ProtoFlow leverages a graph neural network (GNN) encoder-decoder architecture that combines self-supervised pretraining for rich representation learning with a prototype-based fine-tuning stage. This process discovers and refines core prototypes that encapsulate recurring, clinically meaningful patterns of surgical interaction, forming an explainable foundation for workflow analysis. Results: We evaluate our approach on the fine-grained CAT-SG dataset. ProtoFlow not only outperforms standard GNN baselines in overall accuracy but also demonstrates exceptional robustness in limited-data, few-shot scenarios, maintaining strong performance when trained on as few as one surgical video. Our qualitative analyses further show that the learned prototypes successfully identify distinct surgical sub-techniques and provide clear, interpretable insights into workflow deviations and rare complications. Conclusion: By uniting robust representation learning with inherent explainability, ProtoFlow represents a significant step toward developing more transparent, reliable, and data-efficient AI systems, accelerating their potential for clinical adoption in surgical training, real-time decision support, and workflow optimization.

Related papers

Med-CRAFT: Automated Construction of Interpretable and Multi-Hop Video Workloads via Knowledge Graph Traversal [13.216513001286812]
textbfPipelineName is a novel neuro-symbolic data engineering framework.<n> Med-CRAFT extracts structured visual primitives from raw video streams and instantiates them into a dynamic Spatiotemporal Knowledge Graph.<n>We instantiate this pipeline to produce M3-Med-Auto, a large-scale medical video reasoning benchmark.
arXiv Detail & Related papers (2025-11-30T19:24:10Z)
Supervise Less, See More: Training-free Nuclear Instance Segmentation with Prototype-Guided Prompting [53.799446807827714]
We introduce SPROUT, a fully training- and annotation-free prompting framework for nuclear instance segmentation.<n> SPROUT leverages histology-informed priors to construct slide-specific reference prototypes.<n>The resulting foreground and background features are transformed into positive and negative point prompts, enabling the Segment Anything Model (SAM) to produce precise nuclear delineations.
arXiv Detail & Related papers (2025-11-25T05:58:33Z)
An Active Learning Pipeline for Biomedical Image Instance Segmentation with Minimal Human Intervention [4.805039406228118]
Deep learning models such as U-Net have set new benchmarks in segmentation performance.<n>nnU-Net requires a substantial amount of annotated data for cross-validation.<n>This work proposes a data-centric AI workflow that leverages active learning and pseudo-labeling.
arXiv Detail & Related papers (2025-11-06T21:07:26Z)
Interpretable Clinical Classification with Kolgomorov-Arnold Networks [70.72819760172744]
Kolmogorov-Arnold Networks (KANs) offer intrinsic interpretability through transparent, symbolic representations.<n>KANs support built-in patient-level insights, intuitive visualizations, and nearest-patient retrieval.<n>These results position KANs as a promising step toward trustworthy AI that clinicians can understand, audit, and act upon.
arXiv Detail & Related papers (2025-09-20T17:21:58Z)
Data-Efficient Learning for Generalizable Surgical Video Understanding [0.0]
This research aims to bridge gap between deep learning-based surgical video analysis in research and its real-world clinical environments.<n>I benchmarked state-of-the-art neural network architectures to identify the most effective designs for each task.<n>We developed semi-driven frameworks that improve model performance across tasks by leveraging large amounts of unlabeled surgical video.
arXiv Detail & Related papers (2025-08-13T22:00:23Z)
Surgical Foundation Model Leveraging Compression and Entropy Maximization for Image-Guided Surgical Assistance [50.486523249499115]
Real-time video understanding is critical to guide procedures in minimally invasive surgery (MIS)<n>We propose Compress-to-Explore (C2E), a novel self-supervised framework to learn compact, informative representations from surgical videos.<n>C2E uses entropy-maximizing decoders to compress images while preserving clinically relevant details, improving encoder performance without labeled data.
arXiv Detail & Related papers (2025-05-16T14:02:24Z)
Rethinking model prototyping through the MedMNIST+ dataset collection [0.11999555634662634]
This work introduces a comprehensive benchmark for the MedMNIST+ dataset collection.<n>We reassess commonly used Convolutional Neural Networks (CNNs) and Vision Transformer (ViT) architectures across distinct medical datasets.<n>Our findings suggest that computationally efficient training schemes and modern foundation models offer viable alternatives to costly end-to-end training.
arXiv Detail & Related papers (2024-04-24T10:19:25Z)
Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training. We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields. Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z)
CorpusBrain: Pre-train a Generative Retrieval Model for Knowledge-Intensive Language Tasks [62.22920673080208]
Single-step generative model can dramatically simplify the search process and be optimized in end-to-end manner. We name the pre-trained generative retrieval model as CorpusBrain as all information about the corpus is encoded in its parameters without the need of constructing additional index.
arXiv Detail & Related papers (2022-08-16T10:22:49Z)
Aggregating Long-Term Context for Learning Laparoscopic and Robot-Assisted Surgical Workflows [40.48632897750319]
We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics. We demonstrate superior results over existing and novel state-of-the-art segmentation techniques on two laparoscopic cholecystectomy datasets.
arXiv Detail & Related papers (2020-09-01T20:29:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.