Generalisable prediction model of surgical case duration: multicentre development and temporal validation
- URL: http://arxiv.org/abs/2511.08994v1
- Date: Thu, 13 Nov 2025 01:24:45 GMT
- Title: Generalisable prediction model of surgical case duration: multicentre development and temporal validation
- Authors: Daijiro Kabata, Mari Ito, Tokito Koga, Kazuma Yunoki,
- Abstract summary: Existing models often depend on site- or surgeon-specific inputs and rarely undergo external validation.<n>We undertook a retrospective multicentre study using routinely collected data from two general hospitals in Japan.<n>A stacked machine-learning model using only widely available preoperative variables achieved accurate, well-calibrated predictions in temporal external validation.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background: Accurate prediction of surgical case duration underpins operating room (OR) scheduling, yet existing models often depend on site- or surgeon-specific inputs and rarely undergo external validation, limiting generalisability. Methods: We undertook a retrospective multicentre study using routinely collected perioperative data from two general hospitals in Japan (development: 1 January 2021-31 December 2023; temporal test: 1 January-31 December 2024). Elective weekday procedures with American Society of Anesthesiologists (ASA) Physical Status 1-4 were included. Pre-specified preoperative predictors comprised surgical context (year, month, weekday, scheduled duration, general anaesthesia indicator, body position) and patient factors (sex, age, body mass index, allergy, infection, comorbidity, ASA). Missing data were addressed by multiple imputation by chained equations. Four learners (elastic-net, generalised additive models, random forest, gradient-boosted trees) were tuned within internal-external cross-validation (IECV; leave-one-cluster-out by centre-year) and combined by stacked generalisation to predict log-transformed duration. Results: We analysed 63,206 procedures (development 45,647; temporal test 17,559). Cluster-specific and pooled errors and calibrations from IECV are provided with consistent performance across centres and years. In the 2024 temporal test cohort, calibration was good (intercept 0.423, 95%CI 0.372 to 0.474; slope 0.921, 95%CI 0.911 to 0.932). Conclusions: A stacked machine-learning model using only widely available preoperative variables achieved accurate, well-calibrated predictions in temporal external validation, supporting transportability across sites and over time. Such general-purpose tools may improve OR scheduling without relying on idiosyncratic inputs.
Related papers
- What Drives Length of Stay After Elective Spine Surgery? Insights from a Decade of Predictive Modeling [37.556832136788124]
Predicting length of stay after elective spine surgery is essential for optimizing patient outcomes and hospital resource use.<n>Machine learning models consistently outperformed traditional statistical models.<n>There is growing interest in artificial intelligence and machine learning in length of stay prediction, but lack of standardization and external validation limits clinical utility.
arXiv Detail & Related papers (2026-01-24T01:52:06Z) - Transparent Early ICU Mortality Prediction with Clinical Transformer and Per-Case Modality Attribution [42.85462513661566]
We present a lightweight, transparent multimodal ensemble that fuses physiological time-series measurements with unstructured clinical notes from the first 48 hours of an ICU stay.<n>A logistic regression model combines predictions from two modality-specific models: a bidirectional LSTM for vitals and a finetuned ClinicalModernBERT transformer for notes.<n>On the MIMIC-III benchmark, our late-fusion ensemble improves discrimination over the best single model while maintaining well-calibrated predictions.
arXiv Detail & Related papers (2025-11-19T20:11:49Z) - SurgeryLSTM: A Time-Aware Neural Model for Accurate and Explainable Length of Stay Prediction After Spine Surgery [44.119171920037196]
We develop and evaluate machine learning (ML) models for predicting length of stay (LOS) in elective spine surgery.<n>We compare traditional ML models with our developed model, SurgeryLSTM, a masked bidirectional long short-term memory (BiLSTM) with an attention.<n>Performance was evaluated using the coefficient of determination (R2) and key predictors were identified using explainable AI.
arXiv Detail & Related papers (2025-07-15T01:18:28Z) - A SHAP-based explainable multi-level stacking ensemble learning method for predicting the length of stay in acute stroke [3.2906073576204955]
Existing machine learning models have shown suboptimal predictive performance, limited generalisability, and have overlooked system-level factors.<n>We developed an interpretable multi-level stacking ensemble model for ischaemic and haemorrhagic stroke.<n>An explainable ensemble model effectively predicted the prolonged LOS in ischaemic stroke.<n>Further validation is needed for haemorrhagic stroke.
arXiv Detail & Related papers (2025-05-30T01:08:26Z) - Predicting Length of Stay in Neurological ICU Patients Using Classical Machine Learning and Neural Network Models: A Benchmark Study on MIMIC-IV [49.1574468325115]
This study explores multiple ML approaches for predicting LOS in ICU specifically for the patients with neurological diseases based on the MIMIC-IV dataset.<n>The evaluated models include classic ML algorithms (K-Nearest Neighbors, Random Forest, XGBoost and CatBoost) and Neural Networks (LSTM, BERT and Temporal Fusion Transformer)
arXiv Detail & Related papers (2025-05-23T14:06:42Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - Using Pre-training and Interaction Modeling for ancestry-specific disease prediction in UK Biobank [69.90493129893112]
Recent genome-wide association studies (GWAS) have uncovered the genetic basis of complex traits, but show an under-representation of non-European descent individuals.
Here, we assess whether we can improve disease prediction across diverse ancestries using multiomic data.
arXiv Detail & Related papers (2024-04-26T16:39:50Z) - A multi-cohort study on prediction of acute brain dysfunction states
using selective state space models [12.0129301272171]
acute brain dysfunction (ABD) is a critical challenge due to its prevalence and severe implications for patient outcomes.
Our research attempts to solve these problems by harnessing Electronic Health Records (EHR) data.
Existing models solely predict a single state (e.g., either delirium or coma) require at least 24 hours of observation data to make predictions.
Our research fills these gaps in the existing literature by dynamically predicting delirium, coma, mortality and fluctuating for 12-hour intervals throughout an ICU stay.
arXiv Detail & Related papers (2024-03-11T22:58:11Z) - Generalizable and Robust Deep Learning Algorithm for Atrial Fibrillation
Diagnosis Across Ethnicities, Ages and Sexes [0.0]
This study is the first to develop and assess the generalization performance of a deep learning (DL) model for AF events detection.
The model, ArNet2, was developed on a large retrospective dataset of 2,147 patients totaling 51,386 hours of continuous electrocardiogram (ECG)
It was validated on a retrospective dataset of 1,730 consecutives Holter recordings from the Rambam Hospital Holter clinic, Haifa, Israel.
arXiv Detail & Related papers (2022-07-20T05:49:16Z) - Identifying and mitigating bias in algorithms used to manage patients in
a pandemic [4.756860520861679]
Logistic regression models were created to predict COVID-19 mortality, ventilator status and inpatient status using a real-world dataset.
Models showed a 57% decrease in the number of biased trials.
After calibration, the average sensitivity of the predictive models increased from 0.527 to 0.955.
arXiv Detail & Related papers (2021-10-30T21:10:56Z) - On the explainability of hospitalization prediction on a large COVID-19
patient dataset [45.82374977939355]
We develop various AI models to predict hospitalization on a large (over 110$k$) cohort of COVID-19 positive-tested US patients.
Despite high data unbalance, the models reach average precision 0.96-0.98 (0.75-0.85), recall 0.96-0.98 (0.74-0.85), and $F_score 0.97-0.98 (0.79-0.83) on the non-hospitalized (or hospitalized) class.
arXiv Detail & Related papers (2021-10-28T10:23:38Z) - Joint Prediction and Time Estimation of COVID-19 Developing Severe
Symptoms using Chest CT Scan [49.209225484926634]
We propose a joint classification and regression method to determine whether the patient would develop severe symptoms in the later time.
To do this, the proposed method takes into account 1) the weight for each sample to reduce the outliers' influence and explore the problem of imbalance classification.
Our proposed method yields 76.97% of accuracy for predicting the severe cases, 0.524 of the correlation coefficient, and 0.55 days difference for the converted time.
arXiv Detail & Related papers (2020-05-07T12:16:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.