Practical Evaluation of Copula-based Survival Metrics: Beyond the Independent Censoring Assumption
- URL: http://arxiv.org/abs/2502.19460v2
- Date: Mon, 14 Apr 2025 12:22:06 GMT
- Title: Practical Evaluation of Copula-based Survival Metrics: Beyond the Independent Censoring Assumption
- Authors: Christian Marius Lillelund, Shi-ang Qi, Russell Greiner,
- Abstract summary: We propose three copula-based metrics to evaluate survival models in the presence of dependent censoring.<n>Our empirical analyses in synthetic and semi-synthetic datasets show that our metrics can give error estimates that are closer to the true error.
- Score: 4.795126873893598
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conventional survival metrics, such as Harrell's concordance index and the Brier Score, rely on the independent censoring assumption for valid inference in the presence of right-censored data. However, when instances are censored for reasons related to the event of interest, this assumption no longer holds, as this kind of dependent censoring biases the marginal survival estimates of popular nonparametric estimators. In this paper, we propose three copula-based metrics to evaluate survival models in the presence of dependent censoring, and design a framework to create realistic, semi-synthetic datasets with dependent censoring to facilitate the evaluation of the metrics. Our empirical analyses in synthetic and semi-synthetic datasets show that our metrics can give error estimates that are closer to the true error, mainly in terms of prediction accuracy.
Related papers
- Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees [14.251687262492377]
Censoring is the central problem in survival analysis where either the time-to-event (for instance, death) or the time-tocensoring is observed for each sample.
We propose a flexible deep learning-based survival analysis method that simultaneously accommodate for dependent censoring and eliminates the requirement for specifying the ground truth copula.
arXiv Detail & Related papers (2023-12-24T23:34:01Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - An Effective Meaningful Way to Evaluate Survival Models [34.21432603301076]
In practice, the test set includes (right) censored individuals, meaning we do not know when a censored individual actually experienced the event.
We introduce a novel and effective approach for generating realistic semi-synthetic survival datasets.
Our proposed metric is able to rank models accurately based on their performance, and often closely matches the true MAE.
arXiv Detail & Related papers (2023-06-01T23:22:46Z) - On the Blind Spots of Model-Based Evaluation Metrics for Text Generation [79.01422521024834]
We explore a useful but often neglected methodology for robustness analysis of text generation evaluation metrics.
We design and synthesize a wide range of potential errors and check whether they result in a commensurate drop in the metric scores.
Our experiments reveal interesting insensitivities, biases, or even loopholes in existing metrics.
arXiv Detail & Related papers (2022-12-20T06:24:25Z) - A copula-based boosting model for time-to-event prediction with
dependent censoring [0.0]
This paper introduces Clayton-boost, a boosting approach built upon the accelerated failure time model.
It uses a Clayton copula to handle the dependency between the event and censoring distributions.
It shows a strong ability to remove prediction bias at the presence of dependent censoring.
arXiv Detail & Related papers (2022-10-10T17:38:00Z) - Measuring Fairness of Text Classifiers via Prediction Sensitivity [63.56554964580627]
ACCUMULATED PREDICTION SENSITIVITY measures fairness in machine learning models based on the model's prediction sensitivity to perturbations in input features.
We show that the metric can be theoretically linked with a specific notion of group fairness (statistical parity) and individual fairness.
arXiv Detail & Related papers (2022-03-16T15:00:33Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Conformalized Survival Analysis [6.92027612631023]
Existing survival analysis techniques heavily rely on strong modelling assumptions.
We develop an inferential method based on ideas from conformal prediction.
The validity and efficiency of our procedure are demonstrated on synthetic data and real COVID-19 data from the UK Biobank.
arXiv Detail & Related papers (2021-03-17T16:32:26Z) - Survival Estimation for Missing not at Random Censoring Indicators based
on Copula Models [1.52292571922932]
We provide a new estimator for the conditional survival function with missing not at random (MNAR) censoring indicators based on a conditional copula model for the missingness mechanism.
In addition to the theoretical results, we illustrate how the estimators work for small samples through a simulation study and show their practical applicability by analyzing synthetic and real data.
arXiv Detail & Related papers (2020-09-03T15:04:27Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Nonparametric Score Estimators [49.42469547970041]
Estimating the score from a set of samples generated by an unknown distribution is a fundamental task in inference and learning of probabilistic models.
We provide a unifying view of these estimators under the framework of regularized nonparametric regression.
We propose score estimators based on iterative regularization that enjoy computational benefits from curl-free kernels and fast convergence.
arXiv Detail & Related papers (2020-05-20T15:01:03Z) - Censored Quantile Regression Forest [81.9098291337097]
We develop a new estimating equation that adapts to censoring and leads to quantile score whenever the data do not exhibit censoring.
The proposed procedure named it censored quantile regression forest, allows us to estimate quantiles of time-to-event without any parametric modeling assumption.
arXiv Detail & Related papers (2020-01-08T23:20:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.