Comparison of the Cox proportional hazards model and Random Survival Forest algorithm for predicting patient-specific survival probabilities in clinical trial data
- URL: http://arxiv.org/abs/2502.03119v2
- Date: Tue, 27 May 2025 16:40:35 GMT
- Title: Comparison of the Cox proportional hazards model and Random Survival Forest algorithm for predicting patient-specific survival probabilities in clinical trial data
- Authors: Ricarda Graf, Susan Todd, M. Fazil Baksh,
- Abstract summary: Cox proportional hazards model is often used to analyze data from Randomized Controlled Trials (RCT) with time-to-event outcomes.<n> Random survival forest (RSF) is a machine-learning algorithm known for its high predictive performance.<n>We compare the performance of Cox regression and RSF in various simulation scenarios based on two reference from RCTs.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Cox proportional hazards model is often used to analyze data from Randomized Controlled Trials (RCT) with time-to-event outcomes. Random survival forest (RSF) is a machine-learning algorithm known for its high predictive performance. We conduct a comprehensive neutral comparison study to compare the performance of Cox regression and RSF in various simulation scenarios based on two reference datasets from RCTs. The motivation is to identify settings in which one method is preferable over the other when comparing different aspects of performance using measures according to the TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis) recommendations. Our results show that conclusions solely based on the C index, a performance measure that has been predominantly used in previous studies comparing predictive accuracy of the Cox-PH and RSF model based on real-world observational time-to-event data and that has been criticized by methodologists, may not be generalizable to other aspects of predictive performance. We found that measures of overall performance may generally give more reasonable results, and that the standard log-rank splitting rule used for the RSF may be outperformed by alternative splitting rules, in particular in nonproportional hazards settings. In our simulations, performance of the RSF suffers less in data with treatment-covariate interactions compared to data where these are absent. Performance of the Cox-PH model is affected by the violation of the proportional hazards assumption.
Related papers
- Debiased maximum-likelihood estimators for hazard ratios under machine-learning adjustment [0.0]
We propose abandoning the baseline hazard and using machine learning to explicitly model the change in the risk set with or without latent variables.<n> Numerical simulations confirm that the proposed method identifies the ground truth with minimal bias.
arXiv Detail & Related papers (2025-07-23T16:51:09Z) - Robust estimation of heterogeneous treatment effects in randomized trials leveraging external data [1.3124513975412255]
We propose the QR-learner, a model-agnostic learner that estimates conditional average treatment effects (CATE) within the trial population.<n>It has the potential to reduce the CATE prediction mean squared error while maintaining consistency, even when the external data is not aligned with the trial.<n>We apply the methods to a real-world dataset, demonstrating improvements in both CATE estimation and statistical power for detecting heterogeneous effects.
arXiv Detail & Related papers (2025-07-04T16:01:05Z) - SeqRisk: Transformer-augmented latent variable model for improved survival prediction with longitudinal data [4.1476925904032464]
We propose SeqRisk, a method that combines variational autoencoder (VAE) or longitudinal VAE (LVAE) with a transformer encoder and Cox proportional hazards module for risk prediction.
We demonstrate that SeqRisk performs competitively compared to existing approaches on both simulated and real-world datasets.
arXiv Detail & Related papers (2024-09-19T12:35:25Z) - CoxKAN: Kolmogorov-Arnold Networks for Interpretable, High-Performance Survival Analysis [0.3213991044370425]
Kolmogorov-Arnold Networks (KANs) were recently proposed as an interpretable and accurate alternative to multi-layer perceptrons (MLPs)
We introduce CoxKAN, a Cox proportional hazards Kolmogorov-Arnold Network for interpretable, high-performance survival analysis.
arXiv Detail & Related papers (2024-09-06T13:59:58Z) - Risk and cross validation in ridge regression with correlated samples [72.59731158970894]
We provide training examples for the in- and out-of-sample risks of ridge regression when the data points have arbitrary correlations.<n>We demonstrate that in this setting, the generalized cross validation estimator (GCV) fails to correctly predict the out-of-sample risk.<n>We further extend our analysis to the case where the test point has nontrivial correlations with the training set, a setting often encountered in time series forecasting.
arXiv Detail & Related papers (2024-08-08T17:27:29Z) - Optimizing Cox Models with Stochastic Gradient Descent: Theoretical Foundations and Practical Guidances [9.745755948802499]
gradient descent (SGD) has recently been adapted to optimize Cox models.
We demonstrate that the SGD estimator targets an objective function that is batch-size-dependent.
We provide guidance for selecting batch sizes in SGD applications.
arXiv Detail & Related papers (2024-08-05T21:25:10Z) - High Precision Causal Model Evaluation with Conditional Randomization [10.23470075454725]
We introduce a novel low-variance estimator for causal error, dubbed as the pairs estimator.
By applying the same IPW estimator to both the model and true experimental effects, our estimator effectively cancels out the variance due to IPW and achieves a smaller variance.
Our method offers a simple yet powerful solution to evaluate causal inference models in conditional randomization settings without complicated modification of the IPW estimator itself.
arXiv Detail & Related papers (2023-11-03T13:22:27Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - FastCPH: Efficient Survival Analysis for Neural Networks [57.03275837523063]
We propose FastCPH, a new method that runs in linear time and supports both the standard Breslow and Efron methods for tied events.
We also demonstrate the performance of FastCPH combined with LassoNet, a neural network that provides interpretability through feature sparsity.
arXiv Detail & Related papers (2022-08-21T03:35:29Z) - A Federated Cox Model with Non-Proportional Hazards [8.98624781242271]
Recent research has shown the potential for neural networks to improve upon classical survival models such as the Cox model.
We present a federated Cox model that accommodates this data setting and relaxes the proportional hazards assumption.
We experiment with publicly available clinical datasets and demonstrate that the federated model is able to perform as well as a standard model.
arXiv Detail & Related papers (2022-07-11T17:58:54Z) - Continuous-Time Modeling of Counterfactual Outcomes Using Neural
Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare.
Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions.
We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z) - Targeted-BEHRT: Deep learning for observational causal inference on
longitudinal electronic health records [1.3192560874022086]
We investigate causal modelling of an RCT-established null causal association: the effect of antihypertensive use on incident cancer risk.
We develop a dataset for our observational study and a Transformer-based model, Targeted BEHRT coupled with doubly robust estimation.
We find that our model provides more accurate estimates of RR compared to benchmarks for risk ratio estimation on high-dimensional EHR.
arXiv Detail & Related papers (2022-02-07T20:05:05Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - CDSM -- Casual Inference using Deep Bayesian Dynamic Survival Models [3.9169188005935927]
We have developed a causal dynamic survival model (CDSM) that uses the potential outcomes framework with the Bayesian recurrent sub-networks to estimate the difference in survival curves.
Using simulated survival datasets, CDSM has shown good causal effect estimation performance across scenarios of sample dimension, event rate, confounding and overlapping.
arXiv Detail & Related papers (2021-01-26T09:15:49Z) - Increasing the efficiency of randomized trial estimates via linear
adjustment for a prognostic score [59.75318183140857]
Estimating causal effects from randomized experiments is central to clinical research.
Most methods for historical borrowing achieve reductions in variance by sacrificing strict type-I error rate control.
arXiv Detail & Related papers (2020-12-17T21:10:10Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - A General Framework for Survival Analysis and Multi-State Modelling [70.31153478610229]
We use neural ordinary differential equations as a flexible and general method for estimating multi-state survival models.
We show that our model exhibits state-of-the-art performance on popular survival data sets and demonstrate its efficacy in a multi-state setting.
arXiv Detail & Related papers (2020-06-08T19:24:54Z) - Machine learning for causal inference: on the use of cross-fit
estimators [77.34726150561087]
Doubly-robust cross-fit estimators have been proposed to yield better statistical properties.
We conducted a simulation study to assess the performance of several estimators for the average causal effect (ACE)
When used with machine learning, the doubly-robust cross-fit estimators substantially outperformed all of the other estimators in terms of bias, variance, and confidence interval coverage.
arXiv Detail & Related papers (2020-04-21T23:09:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.