Neural Finite-State Machines for Surgical Phase Recognition
- URL: http://arxiv.org/abs/2411.18018v2
- Date: Sun, 02 Mar 2025 04:05:24 GMT
- Title: Neural Finite-State Machines for Surgical Phase Recognition
- Authors: Hao Ding, Zhongpai Gao, Benjamin Planche, Tianyu Luan, Abhishek Sharma, Meng Zheng, Ange Lou, Terrence Chen, Mathias Unberath, Ziyan Wu,
- Abstract summary: Surgical phase recognition is crucial for applications in workflow optimization, performance evaluation, and real-time intervention guidance.<n>We propose the Neural Finite-State Machine (NFSM), a novel approach that enforces temporal coherence by integrating classical state-transition priors with modern neural networks.<n>We demonstrate state-of-the-art performance across multiple benchmarks, including a significant improvement on the BernBypass70 dataset.
- Score: 30.912252237906724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Surgical phase recognition (SPR) is crucial for applications in workflow optimization, performance evaluation, and real-time intervention guidance. However, current deep learning models often struggle with fragmented predictions, failing to capture the sequential nature of surgical workflows. We propose the Neural Finite-State Machine (NFSM), a novel approach that enforces temporal coherence by integrating classical state-transition priors with modern neural networks. NFSM leverages learnable global state embeddings as unique phase identifiers and dynamic transition tables to model phase-to-phase progressions. Additionally, a future phase forecasting mechanism employs repeated frame padding to anticipate upcoming transitions. Implemented as a plug-and-play module, NFSM can be integrated into existing SPR pipelines without changing their core architectures. We demonstrate state-of-the-art performance across multiple benchmarks, including a significant improvement on the BernBypass70 dataset - raising video-level accuracy by 0.9 points and phase-level precision, recall, F1-score, and mAP by 3.8, 3.1, 3.3, and 4.1, respectively. Ablation studies confirm each component's effectiveness and the module's adaptability to various architectures. By unifying finite-state principles with deep learning, NFSM offers a robust path toward consistent, long-term surgical video analysis.
Related papers
- Implicit Neural Differential Model for Spatiotemporal Dynamics [5.1854032131971195]
We introduce Im-PiNDiff, a novel implicit physics-integrated neural differentiable solver for stabletemporal dynamics.
Inspired by deep equilibrium models, Im-PiNDiff advances the state using implicit fixed-point layers, enabling robust long-term simulation.
Im-PiNDiff achieves superior predictive performance, enhanced numerical stability, and substantial reductions in memory and cost.
arXiv Detail & Related papers (2025-04-03T04:07:18Z) - Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems [49.819436680336786]
We propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems.
Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics.
Our ETGPSSM outperforms existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
arXiv Detail & Related papers (2025-03-24T03:19:45Z) - TS-LIF: A Temporal Segment Spiking Neuron Network for Time Series Forecasting [27.91825785119938]
Spiking Neural Networks (SNNs) offer a promising, biologically inspired approach for processing data for time series forecasting.
We introduce the Temporal Leaky Segment Integrate-and-Fire model, featuring a dual-compartment architecture.
Experimental results show that TS-LIF outperforms traditional SNNs in time series forecasting.
arXiv Detail & Related papers (2025-03-07T03:06:21Z) - PreAdaptFWI: Pretrained-Based Adaptive Residual Learning for Full-Waveform Inversion Without Dataset Dependency [8.719356558714246]
Full-waveform inversion (FWI) is a method that utilizes seismic data to invert the physical parameters of subsurface media.
Due to its ill-posed nature, FWI is susceptible to getting trapped in local minima.
Various research efforts have attempted to combine neural networks with FWI to stabilize the inversion process.
arXiv Detail & Related papers (2025-02-17T15:30:17Z) - Input layer regularization and automated regularization hyperparameter tuning for myelin water estimation using deep learning [1.9594393134885413]
We propose a novel deep learning method which combines classical regularization with data augmentation for estimating myelin water fraction (MWF) in the brain via biexponential analysis.
In particular, we study the biexponential model, one of the signal models used for MWF estimation.
arXiv Detail & Related papers (2025-01-30T00:56:28Z) - MASSFormer: Mobility-Aware Spectrum Sensing using Transformer-Driven
Tiered Structure [3.6194127685460553]
We develop a mobility-aware transformer-driven structure (MASSFormer) based cooperative sensing method.
Our method considers a dynamic scenario involving mobile primary users (PUs) and secondary users (SUs)
The proposed method is tested under imperfect reporting channel scenarios to show robustness.
arXiv Detail & Related papers (2024-09-26T05:25:25Z) - Pediatric TSC-Related Epilepsy Classification from Clinical MR Images Using Quantum Neural Network [17.788579893962492]
This study introduces QResNet, a novel deep learning model seamlessly integrating conventional convolutional neural networks with quantum neural networks.
A comprehensive evaluation, demonstrates the superior performance of QResNet in TSC MRI image classification compared to conventional 3D-ResNet models.
arXiv Detail & Related papers (2024-08-08T14:11:06Z) - Stragglers-Aware Low-Latency Synchronous Federated Learning via Layer-Wise Model Updates [71.81037644563217]
Synchronous federated learning (FL) is a popular paradigm for collaborative edge learning.
As some of the devices may have limited computational resources and varying availability, FL latency is highly sensitive to stragglers.
We propose straggler-aware layer-wise federated learning (SALF) that leverages the optimization procedure of NNs via backpropagation to update the global model in a layer-wise fashion.
arXiv Detail & Related papers (2024-03-27T09:14:36Z) - Surgical Temporal Action-aware Network with Sequence Regularization for
Phase Recognition [28.52533700429284]
We propose a Surgical Temporal Action-aware Network with sequence Regularization, named STAR-Net, to recognize surgical phases more accurately from input videos.
MS-STA module integrates visual features with spatial and temporal knowledge of surgical actions at the cost of 2D networks.
Our STAR-Net with MS-STA and DSR can exploit visual features of surgical actions with effective regularization, thereby leading to the superior performance of surgical phase recognition.
arXiv Detail & Related papers (2023-11-21T13:43:16Z) - LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information.
Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z) - Robotic Navigation Autonomy for Subretinal Injection via Intelligent
Real-Time Virtual iOCT Volume Slicing [88.99939660183881]
We propose a framework for autonomous robotic navigation for subretinal injection.
Our method consists of an instrument pose estimation method, an online registration between the robotic and the i OCT system, and trajectory planning tailored for navigation to an injection target.
Our experiments on ex-vivo porcine eyes demonstrate the precision and repeatability of the method.
arXiv Detail & Related papers (2023-01-17T21:41:21Z) - Deep learning applied to computational mechanics: A comprehensive
review, state of the art, and the classics [77.34726150561087]
Recent developments in artificial neural networks, particularly deep learning (DL), are reviewed in detail.
Both hybrid and pure machine learning (ML) methods are discussed.
History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics.
arXiv Detail & Related papers (2022-12-18T02:03:00Z) - Reconstructing high-order sequence features of dynamic functional
connectivity networks based on diversified covert attention patterns for
Alzheimer's disease classification [22.57052592437276]
We introduce self-attention mechanism, a core module of Transformers, to model diversified attention patterns and apply these patterns to reconstruct high-order sequence features of dFCNs.
We propose a CRN method based on diversified attention patterns, DCA-CRN, which combines the advantages of CRNs capturing local in-temporal-temporal features and sequence change patterns, as well as Transformers in learning global and high-order correlation features.
arXiv Detail & Related papers (2022-11-19T02:13:21Z) - Machine Learning model for gas-liquid interface reconstruction in CFD
numerical simulations [59.84561168501493]
The volume of fluid (VoF) method is widely used in multi-phase flow simulations to track and locate the interface between two immiscible fluids.
A major bottleneck of the VoF method is the interface reconstruction step due to its high computational cost and low accuracy on unstructured grids.
We propose a machine learning enhanced VoF method based on Graph Neural Networks (GNN) to accelerate the interface reconstruction on general unstructured meshes.
arXiv Detail & Related papers (2022-07-12T17:07:46Z) - Dissecting Self-Supervised Learning Methods for Surgical Computer Vision [51.370873913181605]
Self-Supervised Learning (SSL) methods have begun to gain traction in the general computer vision community.
The effectiveness of SSL methods in more complex and impactful domains, such as medicine and surgery, remains limited and unexplored.
We present an extensive analysis of the performance of these methods on the Cholec80 dataset for two fundamental and popular tasks in surgical context understanding, phase recognition and tool presence detection.
arXiv Detail & Related papers (2022-07-01T14:17:11Z) - Multimodal Semantic Scene Graphs for Holistic Modeling of Surgical
Procedures [70.69948035469467]
We take advantage of the latest computer vision methodologies for generating 3D graphs from camera views.
We then introduce the Multimodal Semantic Graph Scene (MSSG) which aims at providing unified symbolic and semantic representation of surgical procedures.
arXiv Detail & Related papers (2021-06-09T14:35:44Z) - Influence Estimation and Maximization via Neural Mean-Field Dynamics [60.91291234832546]
We propose a novel learning framework using neural mean-field (NMF) dynamics for inference and estimation problems.
Our framework can simultaneously learn the structure of the diffusion network and the evolution of node infection probabilities.
arXiv Detail & Related papers (2021-06-03T00:02:05Z) - Relational Graph Learning on Visual and Kinematics Embeddings for
Accurate Gesture Recognition in Robotic Surgery [84.73764603474413]
We propose a novel online approach of multi-modal graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information.
The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset.
arXiv Detail & Related papers (2020-11-03T11:00:10Z) - Aggregating Long-Term Context for Learning Laparoscopic and
Robot-Assisted Surgical Workflows [40.48632897750319]
We propose a new temporal network structure that leverages task-specific network representation to collect long-term sufficient statistics.
We demonstrate superior results over existing and novel state-of-the-art segmentation techniques on two laparoscopic cholecystectomy datasets.
arXiv Detail & Related papers (2020-09-01T20:29:14Z) - Learn to cycle: Time-consistent feature discovery for action recognition [83.43682368129072]
Generalizing over temporal variations is a prerequisite for effective action recognition in videos.
We introduce Squeeze Re Temporal Gates (SRTG), an approach that favors temporal activations with potential variations.
We show consistent improvement when using SRTPG blocks, with only a minimal increase in the number of GFLOs.
arXiv Detail & Related papers (2020-06-15T09:36:28Z) - TeCNO: Surgical Phase Recognition with Multi-Stage Temporal
Convolutional Networks [43.95869213955351]
We propose a Multi-Stage Temporal Convolutional Network (MS-TCN) that performs hierarchical prediction refinement for surgical phase recognition.
Our method is thoroughly evaluated on two datasets of laparoscopic cholecystectomy videos with and without the use of additional surgical tool information.
arXiv Detail & Related papers (2020-03-24T10:12:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.