Stable Prediction of Adverse Events in Medical Time-Series Data
- URL: http://arxiv.org/abs/2510.14286v1
- Date: Thu, 16 Oct 2025 04:16:54 GMT
- Title: Stable Prediction of Adverse Events in Medical Time-Series Data
- Authors: Mayank Keoliya, Seewon Choi, Rajeev Alur, Mayur Naik, Eric Wong,
- Abstract summary: Early event prediction (EEP) systems continuously estimate a patient's imminent risk to support clinical decision-making.<n>We introduce CAREBench, an EEP benchmark that evaluates deployability using multi-modal inputs-tabular EHR, ECG waveforms, and clinical text.<n>We propose a stability metric that quantifies short-term variability in per-patient risk and penalizes abrupt oscillations based on local-Lipschitz constants.
- Score: 18.92202613147342
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Early event prediction (EEP) systems continuously estimate a patient's imminent risk to support clinical decision-making. For bedside trust, risk trajectories must be accurate and temporally stable, shifting only with new, relevant evidence. However, current benchmarks (a) ignore stability of risk scores and (b) evaluate mainly on tabular inputs, leaving trajectory behavior untested. To address this gap, we introduce CAREBench, an EEP benchmark that evaluates deployability using multi-modal inputs-tabular EHR, ECG waveforms, and clinical text-and assesses temporal stability alongside predictive accuracy. We propose a stability metric that quantifies short-term variability in per-patient risk and penalizes abrupt oscillations based on local-Lipschitz constants. CAREBench spans six prediction tasks such as sepsis onset and compares classical learners, deep sequence models, and zero-shot LLMs. Across tasks, existing methods, especially LLMs, struggle to jointly optimize accuracy and stability, with notably poor recall at high-precision operating points. These results highlight the need for models that produce evidence-aligned, stable trajectories to earn clinician trust in continuous monitoring settings. (Code: https://github.com/SeewonChoi/CAREBench.)
Related papers
- Diagnostics for Individual-Level Prediction Instability in Machine Learning for Healthcare [0.0]
We propose an evaluation framework that quantifies individual-level prediction instability by using two complementary diagnostics.<n>We apply these diagnostics to simulated data and GUSTO-I clinical dataset.
arXiv Detail & Related papers (2026-02-27T03:42:28Z) - Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints [39.44014654945035]
This study focuses on system-level design choices required to build safe and deployable clinical NLP under temporal leakage constraints.<n>We present a lightweight auditing pipeline that integrates interpretability into the model development process to identify and suppress leakage-prone signals prior to final training.<n>Results show that audited models exhibit more conservative and better-calibrated probability estimates, with reduced reliance on discharge-related lexical cues.
arXiv Detail & Related papers (2026-01-24T01:46:46Z) - Conformal Lesion Segmentation for 3D Medical Images [82.92159832699583]
We propose a risk-constrained framework that calibrates data-driven thresholds via conformalization to ensure the test-time FNR remains below a target tolerance.<n>We validate the statistical soundness and predictive performance of CLS on six 3D-LS datasets across five backbone models, and conclude with actionable insights for deploying risk-aware segmentation in clinical practice.
arXiv Detail & Related papers (2025-10-19T08:21:00Z) - Towards Trustworthy Vital Sign Forecasting: Leveraging Uncertainty for Prediction Intervals [32.233133404873016]
We present two methods for deriving PIs from the Reconstruction Uncertainty Estimate (RUE), an uncertainty measure well-suited to vital-sign forecasting.<n>We evaluate these methods on two large public datasets with minute- and hour-level sampling, representing high- and low-frequency health signals.
arXiv Detail & Related papers (2025-09-01T10:03:26Z) - Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift? [51.12297424766236]
AURORA is a framework to evaluate malware classifiers based on their confidence quality and operational resilience.<n>AURORA is complemented by a set of metrics designed to go beyond point-in-time performance.<n>The fragility in SOTA frameworks across datasets of varying drift suggests the need for a return to the whiteboard.
arXiv Detail & Related papers (2025-05-28T20:22:43Z) - Evidential time-to-event prediction with calibrated uncertainty quantification [12.446406577462069]
Time-to-event analysis provides insights into clinical prognosis and treatment recommendations.<n>We propose an evidential regression model specifically designed for time-to-event prediction.<n>We show that our model delivers both accurate and reliable performance, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2024-11-12T15:06:04Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - Inadequacy of common stochastic neural networks for reliable clinical
decision support [0.4262974002462632]
Widespread adoption of AI for medical decision making is still hindered due to ethical and safety-related concerns.
Common deep learning approaches, however, have the tendency towards overconfidence under data shift.
This study investigates their actual reliability in clinical applications.
arXiv Detail & Related papers (2024-01-24T18:49:30Z) - HypUC: Hyperfine Uncertainty Calibration with Gradient-boosted
Corrections for Reliable Regression on Imbalanced Electrocardiograms [3.482894964998886]
We propose HypUC, a framework for imbalanced probabilistic regression in medical time series.
HypUC is evaluated on a large, diverse, real-world dataset of ECGs collected from millions of patients.
arXiv Detail & Related papers (2023-11-23T06:17:31Z) - Towards Reliable Medical Image Segmentation by Modeling Evidential Calibrated Uncertainty [57.023423137202485]
Concerns regarding the reliability of medical image segmentation persist among clinicians.<n>We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.<n>By leveraging subjective logic theory, we explicitly model probability and uncertainty for medical image segmentation.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - When in Doubt: Neural Non-Parametric Uncertainty Quantification for
Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions.
Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations.
We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.