An Effective Meaningful Way to Evaluate Survival Models
- URL: http://arxiv.org/abs/2306.01196v1
- Date: Thu, 1 Jun 2023 23:22:46 GMT
- Title: An Effective Meaningful Way to Evaluate Survival Models
- Authors: Shi-ang Qi, Neeraj Kumar, Mahtab Farrokh, Weijie Sun, Li-Hao Kuan,
Rajesh Ranganath, Ricardo Henao, Russell Greiner
- Abstract summary: In practice, the test set includes (right) censored individuals, meaning we do not know when a censored individual actually experienced the event.
We introduce a novel and effective approach for generating realistic semi-synthetic survival datasets.
Our proposed metric is able to rank models accurately based on their performance, and often closely matches the true MAE.
- Score: 34.21432603301076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One straightforward metric to evaluate a survival prediction model is based
on the Mean Absolute Error (MAE) -- the average of the absolute difference
between the time predicted by the model and the true event time, over all
subjects. Unfortunately, this is challenging because, in practice, the test set
includes (right) censored individuals, meaning we do not know when a censored
individual actually experienced the event. In this paper, we explore various
metrics to estimate MAE for survival datasets that include (many) censored
individuals. Moreover, we introduce a novel and effective approach for
generating realistic semi-synthetic survival datasets to facilitate the
evaluation of metrics. Our findings, based on the analysis of the
semi-synthetic datasets, reveal that our proposed metric (MAE using
pseudo-observations) is able to rank models accurately based on their
performance, and often closely matches the true MAE -- in particular, is better
than several alternative methods.
Related papers
- LiveXiv -- A Multi-Modal Live Benchmark Based on Arxiv Papers Content [62.816876067499415]
We propose LiveXiv: a scalable evolving live benchmark based on scientific ArXiv papers.
LiveXiv accesses domain-specific manuscripts at any given timestamp and proposes to automatically generate visual question-answer pairs.
We benchmark multiple open and proprietary Large Multi-modal Models (LMMs) on the first version of our benchmark, showing its challenging nature and exposing the models true abilities.
arXiv Detail & Related papers (2024-10-14T17:51:23Z) - Few-Shot Load Forecasting Under Data Scarcity in Smart Grids: A Meta-Learning Approach [0.18641315013048293]
This paper proposes adapting an established model-agnostic meta-learning algorithm for short-term load forecasting.
The proposed method can rapidly adapt and generalize within any unknown load time series of arbitrary length.
The proposed model is evaluated using a dataset of historical load consumption data from real-world consumers.
arXiv Detail & Related papers (2024-06-09T18:59:08Z) - A Large-Scale Neutral Comparison Study of Survival Models on Low-Dimensional Data [7.199059106376138]
This work presents the first large-scale neutral benchmark experiment focused on single-event, right-censored, low-dimensional survival data.
We benchmark 18 models, ranging from classical statistical approaches to many common machine learning methods, on 32 publicly available datasets.
arXiv Detail & Related papers (2024-06-06T14:13:38Z) - TripleSurv: Triplet Time-adaptive Coordinate Loss for Survival Analysis [15.496918127515665]
We propose a time-adaptive coordinate loss function, TripleSurv, to handle the complexities of learning process and exploit valuable survival time values.
Our TripleSurv is evaluated on three real-world survival datasets and a public synthetic dataset.
arXiv Detail & Related papers (2024-01-05T08:37:57Z) - Composite Survival Analysis: Learning with Auxiliary Aggregated
Baselines and Survival Scores [0.0]
Survival Analysis (SA) constitutes the default method for time-to-event modeling.
We show how to improve the training and inference of SA models by decoupling their full expression into (1) an aggregated baseline hazard, which captures the overall behavior of a given population, and (2) independently distributed survival scores, which model idiosyncratic probabilistic dynamics of its given members, in a fully parametric setting.
arXiv Detail & Related papers (2023-12-10T11:13:22Z) - CenTime: Event-Conditional Modelling of Censoring in Survival Analysis [49.44664144472712]
We introduce CenTime, a novel approach to survival analysis that directly estimates the time to event.
Our method features an innovative event-conditional censoring mechanism that performs robustly even when uncensored data is scarce.
Our results indicate that CenTime offers state-of-the-art performance in predicting time-to-death while maintaining comparable ranking performance.
arXiv Detail & Related papers (2023-09-07T17:07:33Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Evaluating Predictive Uncertainty and Robustness to Distributional Shift
Using Real World Data [0.0]
We propose metrics for general regression tasks using the Shifts Weather Prediction dataset.
We also present an evaluation of the baseline methods using these metrics.
arXiv Detail & Related papers (2021-11-08T17:32:10Z) - Performance metrics for intervention-triggering prediction models do not
reflect an expected reduction in outcomes from using the model [71.9860741092209]
Clinical researchers often select among and evaluate risk prediction models.
Standard metrics calculated from retrospective data are only related to model utility under certain assumptions.
When predictions are delivered repeatedly throughout time, the relationship between standard metrics and utility is further complicated.
arXiv Detail & Related papers (2020-06-02T16:26:49Z) - Meta-Learned Confidence for Few-shot Learning [60.6086305523402]
A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples.
We propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries.
We validate our few-shot learning model with meta-learned confidence on four benchmark datasets.
arXiv Detail & Related papers (2020-02-27T10:22:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.