Related papers: End to End AI System for Surgical Gesture Sequence Recognition and Clinical Outcome Prediction

End to End AI System for Surgical Gesture Sequence Recognition and Clinical Outcome Prediction

URL: http://arxiv.org/abs/2511.11899v1
Date: Fri, 14 Nov 2025 22:02:46 GMT
Title: End to End AI System for Surgical Gesture Sequence Recognition and Clinical Outcome Prediction
Authors: Xi Li, Nicholas Matsumoto, Ujjwal Pasupulety, Atharva Deo, Cherine Yang, Jay Moran, Miguel E. Hernandez, Peter Wager, Jasmine Lin, Jeanine Kim, Alvin C. Goh, Christian Wagner, Geoffrey A. Sonn, Andrew J. Hung,
Abstract summary: We present Frame-to-Outcome (F2O), an end-to-end system that translates tissue dissection videos into gesture sequences.<n>F2O robustly detects consecutive short (2 seconds) gestures in the nerve-sparing step of robot-assisted radical prostatectomy.
Score: 5.409483209009106
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Fine-grained analysis of intraoperative behavior and its impact on patient outcomes remain a longstanding challenge. We present Frame-to-Outcome (F2O), an end-to-end system that translates tissue dissection videos into gesture sequences and uncovers patterns associated with postoperative outcomes. Leveraging transformer-based spatial and temporal modeling and frame-wise classification, F2O robustly detects consecutive short (~2 seconds) gestures in the nerve-sparing step of robot-assisted radical prostatectomy (AUC: 0.80 frame-level; 0.81 video-level). F2O-derived features (gesture frequency, duration, and transitions) predicted postoperative outcomes with accuracy comparable to human annotations (0.79 vs. 0.75; overlapping 95% CI). Across 25 shared features, effect size directions were concordant with small differences (~ 0.07), and strong correlation (r = 0.96, p < 1e-14). F2O also captured key patterns linked to erectile function recovery, including prolonged tissue peeling and reduced energy use. By enabling automatic interpretable assessment, F2O establishes a foundation for data-driven surgical feedback and prospective clinical decision support.

Related papers

Automated glenoid bone loss measurement and segmentation in CT scans for pre-operative planning in shoulder instability [4.618498494409548]
Reliable measurement of glenoid bone loss is essential for operative planning in shoulder instability.<n>We developed and validated a fully automated deep learning pipeline for measuring glenoid bone loss on three-dimensional computed tomography (CT) scans.
arXiv Detail & Related papers (2025-11-18T03:12:22Z)
Causal Machine Learning for Surgical Interventions [3.701687265960785]
Surgical decision-making is complex and requires understanding causal relationships between patient characteristics, interventions, and outcomes.<n>In this study, we develop a multi-task meta-learning framework, X-MultiTask, for ITE estimation.<n>By providing robust, patient-specific causal estimates, X-MultiTask offers a powerful tool to advance personalized surgical care and improve patient outcomes.
arXiv Detail & Related papers (2025-09-24T02:31:43Z)
Organ-Agents: Virtual Human Physiology Simulator via LLMs [66.40796430669158]
Organ-Agents is a multi-agent framework that simulates human physiology via LLM-driven agents.<n>We curated data from 7,134 sepsis patients and 7,895 controls, generating high-resolution trajectories across 9 systems and 125 variables.<n>Organ-Agents achieved high simulation accuracy on 4,509 held-out patients, with per-system MSEs 0.16 and robustness across SOFA-based severity strata.
arXiv Detail & Related papers (2025-08-20T01:58:45Z)
A Novel Attention-Augmented Wavelet YOLO System for Real-time Brain Vessel Segmentation on Transcranial Color-coded Doppler [49.03919553747297]
We propose an AI-powered, real-time CoW auto-segmentation system capable of efficiently capturing cerebral arteries.<n>No prior studies have explored AI-driven cerebrovascular segmentation using Transcranial Color-coded Doppler (TCCD)<n>The proposed AAW-YOLO demonstrated strong performance in segmenting both ipsilateral and contralateral CoW vessels.
arXiv Detail & Related papers (2025-08-19T14:41:22Z)
Predicting Postoperative Stroke in Elderly SICU Patients: An Interpretable Machine Learning Model Using MIMIC Data [0.0]
Postoperative stroke remains a critical complication in elderly surgical intensive care unit (SICU) patients.<n>We constructed a combined cohort of 19,085 elderly SICU admissions from the MIMIC-III and MIMIC-IV databases.<n>We developed an interpretable machine learning framework to predict in-hospital stroke using clinical data from the first 24 hours of intensive care unit stay.
arXiv Detail & Related papers (2025-06-02T22:53:12Z)
WMKA-Net: A Weighted Multi-Kernel Attention Network for Retinal Vessel Segmentation [0.48536814705421105]
This study proposes a dual-stage solution to address the issues of insufficient multi-scale feature fusion, disruption of contextual continuity, and noise interference.<n>The first stage employs a Multi-Scale Fusion Module (RMS) that uses hierarchical adaptive convolution to dynamically merge cross-scale features from capillaries to main vessels.<n>The second stage introduces a Vascular-Oriented Attention Mechanism, which models long-distance vascular continuity through an axial pathway.
arXiv Detail & Related papers (2025-04-21T06:32:25Z)
Biological and Radiological Dictionary of Radiomics Features: Addressing Understandable AI Issues in Personalized Prostate Cancer; Dictionary Version PM1.0 [1.2200133485912512]
We created a standardized dictionary of biological/radiologicalRFs for PI-RADS and associated risk factors.<n>We then utilized the dictionary to interpret the best-predictive models.<n>This approach achieved the highest average accuracy of 0.78, significantly outperforming single-sequence methods.
arXiv Detail & Related papers (2024-12-14T20:55:31Z)
A Federated Learning Framework for Stenosis Detection [70.27581181445329]
This study explores the use of Federated Learning (FL) for stenosis detection in coronary angiography images (CA) Two heterogeneous datasets from two institutions were considered: dataset 1 includes 1219 images from 200 patients, which we acquired at the Ospedale Riuniti of Ancona (Italy) dataset 2 includes 7492 sequential images from 90 patients from a previous study available in the literature.
arXiv Detail & Related papers (2023-10-30T11:13:40Z)
LoViT: Long Video Transformer for Surgical Phase Recognition [59.06812739441785]
We present a two-stage method, called Long Video Transformer (LoViT) for fusing short- and long-term temporal information. Our approach outperforms state-of-the-art methods on the Cholec80 and AutoLaparo datasets consistently.
arXiv Detail & Related papers (2023-05-15T20:06:14Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
Joint Prediction and Time Estimation of COVID-19 Developing Severe Symptoms using Chest CT Scan [49.209225484926634]
We propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time. To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification. Our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the converted time.
arXiv Detail & Related papers (2020-05-07T12:16:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.