Behavior of prediction performance metrics with rare events
- URL: http://arxiv.org/abs/2504.16185v2
- Date: Mon, 03 Nov 2025 18:10:28 GMT
- Title: Behavior of prediction performance metrics with rare events
- Authors: Emily Minus, R. Yates Coley, Susan M. Shortreed, Brian D. Williamson,
- Abstract summary: Area under the receiving operator characteristic curve (AUC) is commonly reported alongside prediction models for binary outcomes.<n>Recent articles have raised concerns that AUC might be a misleading measure of prediction performance in the rare event setting.<n>We conducted a simulation study to determine when or whether AUC is unstable in the rare event setting.
- Score: 0.8773694701994543
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Objective: Area under the receiving operator characteristic curve (AUC) is commonly reported alongside prediction models for binary outcomes. Recent articles have raised concerns that AUC might be a misleading measure of prediction performance in the rare event setting. This setting is common since many events of clinical importance are rare. We aimed to determine whether the bias and variance of AUC are driven by the number of events or the event rate. We also investigated the behavior of other commonly used measures of prediction performance, including positive predictive value, accuracy, sensitivity, and specificity. Study Design and Setting: We conducted a simulation study to determine when or whether AUC is unstable in the rare event setting by varying the size of datasets used to train and evaluate prediction models. This plasmode simulation study was based on data from the Mental Health Research Network; the data contained 149 predictors and the outcome of interest, suicide attempt, which had event rate 0.92\% in the original dataset. Results: Our results indicate that poor AUC behavior -- as measured by empirical bias, variability of cross-validated AUC estimates, and empirical coverage of confidence intervals -- is driven by the number of events in a rare-event setting, not event rate. Performance of sensitivity is driven by the number of events, while that of specificity is driven by the number of non-events. Other measures, including positive predictive value and accuracy, depend on the event rate even in large samples. Conclusion: AUC is reliable in the rare event setting provided that the total number of events is moderately large; in our simulations, we observed near zero bias with 1000 events.
Related papers
- ICODEN: Ordinary Differential Equation Neural Networks for Interval-Censored Data [4.9839207502291805]
ICODEN is an ordinary differential equation-based neural network for interval-censored data.<n>It consistently achieves satisfactory predictive accuracy and remains stable as the number of predictors increases.<n>These results establish ICODEN as a practical assumption-lean tool for prediction with interval-censored survival data in high-dimensional biomedical settings.
arXiv Detail & Related papers (2026-02-10T21:18:38Z) - Evidential time-to-event prediction with calibrated uncertainty quantification [12.446406577462069]
Time-to-event analysis provides insights into clinical prognosis and treatment recommendations.<n>We propose an evidential regression model specifically designed for time-to-event prediction.<n>We show that our model delivers both accurate and reliable performance, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2024-11-12T15:06:04Z) - MENSA: A Multi-Event Network for Survival Analysis with Trajectory-based Likelihood Estimation [4.0913802846346625]
We introduce MENSA, a novel deep learning approach for multi-event survival analysis.<n>MeNSA jointly learns representations of the input features while capturing the complex dependence structure among events.<n>It consistently gives good discrimination performances and accurate time-to-event predictions in single-event, competing-risk, and multi-event problems.
arXiv Detail & Related papers (2024-09-10T14:02:34Z) - Evaluating the Role of Data Enrichment Approaches Towards Rare Event Analysis in Manufacturing [1.3980986259786223]
Rare events are occurrences that take place with a significantly lower frequency than more common regular events.
In manufacturing, predicting such events is particularly important, as they lead to unplanned downtime, shortening equipment lifespan, and high energy consumption.
This paper evaluates the role of data enrichment techniques combined with supervised machine-learning techniques for rare event detection and prediction.
arXiv Detail & Related papers (2024-07-01T00:05:56Z) - SMURF-THP: Score Matching-based UnceRtainty quantiFication for
Transformer Hawkes Process [76.98721879039559]
We propose SMURF-THP, a score-based method for learning Transformer Hawkes process and quantifying prediction uncertainty.
Specifically, SMURF-THP learns the score function of events' arrival time based on a score-matching objective.
We conduct extensive experiments in both event type prediction and uncertainty quantification of arrival time.
arXiv Detail & Related papers (2023-10-25T03:33:45Z) - Score Matching-based Pseudolikelihood Estimation of Neural Marked
Spatio-Temporal Point Process with Uncertainty Quantification [59.81904428056924]
We introduce SMASH: a Score MAtching estimator for learning markedPs with uncertainty quantification.
Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of markedPs through score-matching.
The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.
arXiv Detail & Related papers (2023-10-25T02:37:51Z) - CenTime: Event-Conditional Modelling of Censoring in Survival Analysis [49.44664144472712]
We introduce CenTime, a novel approach to survival analysis that directly estimates the time to event.
Our method features an innovative event-conditional censoring mechanism that performs robustly even when uncensored data is scarce.
Our results indicate that CenTime offers state-of-the-art performance in predicting time-to-death while maintaining comparable ranking performance.
arXiv Detail & Related papers (2023-09-07T17:07:33Z) - Causal inference for the expected number of recurrent events in the presence of a terminal event [0.2446672595462589]
We develop a multiply robust estimation framework for causal inference in recurrent event data with a terminal failure event.<n>We show that the estimand can be identified under a weaker condition than conditionally independent censoring.
arXiv Detail & Related papers (2023-06-28T21:31:25Z) - Abnormal Event Detection via Hypergraph Contrastive Learning [54.80429341415227]
Abnormal event detection plays an important role in many real applications.
In this paper, we study the unsupervised abnormal event detection problem in Attributed Heterogeneous Information Network.
A novel hypergraph contrastive learning method, named AEHCL, is proposed to fully capture abnormal event patterns.
arXiv Detail & Related papers (2023-04-02T08:23:20Z) - AA-Forecast: Anomaly-Aware Forecast for Extreme Events [25.89754218631525]
Time series models often deal with extreme events and anomalies, both prevalent in real-world datasets.
We propose an anomaly-aware forecast framework that leverages the previously seen effects of anomalies to improve its prediction accuracy.
Specifically, the framework automatically extracts anomalies and incorporates them through an attention mechanism to increase its accuracy for future extreme events.
arXiv Detail & Related papers (2022-08-21T17:51:46Z) - Causal Knowledge Guided Societal Event Forecasting [24.437437565689393]
We introduce a deep learning framework that integrates causal effect estimation into event forecasting.
Two robust learning modules, including a feature reweighting module and an approximate loss, are introduced to enable prior knowledge injection.
arXiv Detail & Related papers (2021-12-10T17:41:02Z) - When in Doubt: Neural Non-Parametric Uncertainty Quantification for
Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions.
Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations.
We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z) - Novel Techniques to Assess Predictive Systems and Reduce Their Alarm
Burden [0.0]
We introduce an improved performance assessment technique ("u-metrics") using utility functions to score each prediction.
Compared to traditional performance measures, u-metrics more accurately reflect the real-world benefits and costs of a predictor operating in a workflow context.
We also describe the use of "snoozing," a method whereby predictions are suppressed for a period of time, commonly improving predictor performance.
arXiv Detail & Related papers (2021-02-10T19:05:06Z) - Enabling Counterfactual Survival Analysis with Balanced Representations [64.17342727357618]
Survival data are frequently encountered across diverse medical applications, i.e., drug development, risk profiling, and clinical trials.
We propose a theoretically grounded unified framework for counterfactual inference applicable to survival outcomes.
arXiv Detail & Related papers (2020-06-14T01:15:00Z) - Neural Conditional Event Time Models [11.920908437656413]
Event time models predict occurrence times of an event of interest based on known features.
We develop a conditional event time model that distinguishes between a) the probability of event occurrence, and b) the predicted time of occurrence.
Results demonstrate superior event occurrence and event time predictions on synthetic data, medical events (MIMIC-III), and social media posts.
arXiv Detail & Related papers (2020-04-03T05:08:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.