Related papers: Stop Chasing the C-index: This Is How We Should Evaluate Our Survival Models

Stop Chasing the C-index: This Is How We Should Evaluate Our Survival Models

URL: http://arxiv.org/abs/2506.02075v1
Date: Mon, 02 Jun 2025 07:59:34 GMT
Title: Stop Chasing the C-index: This Is How We Should Evaluate Our Survival Models
Authors: Christian Marius Lillelund, Shi-ang Qi, Russell Greiner, Christian Fischer Pedersen,
Abstract summary: We argue that many survival analysis and time-to-event models are incorrectly evaluated.<n>We present a set of key desiderata for choosing the right evaluation metric and discuss their pros and cons.
Score: 4.389420785110098
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We argue that many survival analysis and time-to-event models are incorrectly evaluated. First, we survey many examples of evaluation approaches in the literature and find that most rely on concordance (C-index). However, the C-index only measures a model's discriminative ability and does not assess other important aspects, such as the accuracy of the time-to-event predictions or the calibration of the model's probabilistic estimates. Next, we present a set of key desiderata for choosing the right evaluation metric and discuss their pros and cons. These are tailored to the challenges in survival analysis, such as sensitivity to miscalibration and various censoring assumptions. We hypothesize that the current development of survival metrics conforms to a double-helix ladder, and that model validity and metric validity must stand on the same rung of the assumption ladder. Finally, we discuss the appropriate methods for evaluating a survival model in practice and summarize various viewpoints opposing our analysis.

Related papers

SurvUnc: A Meta-Model Based Uncertainty Quantification Framework for Survival Analysis [8.413107141283502]
Survival analysis is fundamental in numerous real-world applications, particularly in high-stakes domains such as healthcare and risk assessment.<n>Despite advances in numerous survival models, quantifying the uncertainty of predictions remains underexplored and challenging.<n>We introduce SurvUnc, a novel meta-model based framework for post-hoc uncertainty quantification for survival models.
arXiv Detail & Related papers (2025-05-20T18:12:20Z)
Predictive Multiplicity in Survival Models: A Method for Quantifying Model Uncertainty in Predictive Maintenance Applications [0.0]
We frame predictive multiplicity as a critical concern in survival-based models.<n>We introduce formal measures -- ambiguity, discrepancy, and obscurity -- to quantify it.<n>This is particularly relevant for downstream tasks such as maintenance scheduling.
arXiv Detail & Related papers (2025-04-16T15:04:00Z)
Evidential time-to-event prediction with calibrated uncertainty quantification [12.446406577462069]
Time-to-event analysis provides insights into clinical prognosis and treatment recommendations.<n>We propose an evidential regression model specifically designed for time-to-event prediction.<n>We show that our model delivers both accurate and reliable performance, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2024-11-12T15:06:04Z)
A Probabilistic Perspective on Unlearning and Alignment for Large Language Models [48.96686419141881]
We introduce the first formal probabilistic evaluation framework for Large Language Models (LLMs)<n> Namely, we propose novel metrics with high probability guarantees concerning the output distribution of a model.<n>Our metrics are application-independent and allow practitioners to make more reliable estimates about model capabilities before deployment.
arXiv Detail & Related papers (2024-10-04T15:44:23Z)
Challenges and Considerations in the Evaluation of Bayesian Causal Discovery [49.0053848090947]
Representing uncertainty in causal discovery is a crucial component for experimental design, and more broadly, for safe and reliable causal decision making. Unlike non-Bayesian causal discovery, which relies on a single estimated causal graph and model parameters for assessment, causal discovery presents challenges due to the nature of its quantity. No consensus on the most suitable metric for evaluation.
arXiv Detail & Related papers (2024-06-05T12:45:23Z)
Selective Nonparametric Regression via Testing [54.20569354303575]
We develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor.
arXiv Detail & Related papers (2023-09-28T13:04:11Z)
CenTime: Event-Conditional Modelling of Censoring in Survival Analysis [49.44664144472712]
We introduce CenTime, a novel approach to survival analysis that directly estimates the time to event. Our method features an innovative event-conditional censoring mechanism that performs robustly even when uncensored data is scarce. Our results indicate that CenTime offers state-of-the-art performance in predicting time-to-death while maintaining comparable ranking performance.
arXiv Detail & Related papers (2023-09-07T17:07:33Z)
Position: AI Evaluation Should Learn from How We Test Humans [65.36614996495983]
We argue that psychometrics, a theory originating in the 20th century for human assessment, could be a powerful solution to the challenges in today's AI evaluations.
arXiv Detail & Related papers (2023-06-18T09:54:33Z)
An Effective Meaningful Way to Evaluate Survival Models [34.21432603301076]
In practice, the test set includes (right) censored individuals, meaning we do not know when a censored individual actually experienced the event. We introduce a novel and effective approach for generating realistic semi-synthetic survival datasets. Our proposed metric is able to rank models accurately based on their performance, and often closely matches the true MAE.
arXiv Detail & Related papers (2023-06-01T23:22:46Z)
The Implicit Delta Method [61.36121543728134]
In this paper, we propose an alternative, the implicit delta method, which works by infinitesimally regularizing the training loss of uncertainty. We show that the change in the evaluation due to regularization is consistent for the variance of the evaluation estimator, even when the infinitesimal change is approximated by a finite difference.
arXiv Detail & Related papers (2022-11-11T19:34:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.