Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints
- URL: http://arxiv.org/abs/2602.15852v2
- Date: Thu, 19 Feb 2026 02:13:21 GMT
- Title: Building Safe and Deployable Clinical Natural Language Processing under Temporal Leakage Constraints
- Authors: Ha Na Cho, Sairam Sutari, Alexander Lopez, Hansen Bow, Kai Zheng,
- Abstract summary: This study focuses on system-level design choices required to build safe and deployable clinical NLP under temporal leakage constraints.<n>We present a lightweight auditing pipeline that integrates interpretability into the model development process to identify and suppress leakage-prone signals prior to final training.<n>Results show that audited models exhibit more conservative and better-calibrated probability estimates, with reduced reliance on discharge-related lexical cues.
- Score: 39.44014654945035
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Clinical natural language processing (NLP) models have shown promise for supporting hospital discharge planning by leveraging narrative clinical documentation. However, note-based models are particularly vulnerable to temporal and lexical leakage, where documentation artifacts encode future clinical decisions and inflate apparent predictive performance. Such behavior poses substantial risks for real-world deployment, where overconfident or temporally invalid predictions can disrupt clinical workflows and compromise patient safety. This study focuses on system-level design choices required to build safe and deployable clinical NLP under temporal leakage constraints. We present a lightweight auditing pipeline that integrates interpretability into the model development process to identify and suppress leakage-prone signals prior to final training. Using next-day discharge prediction after elective spine surgery as a case study, we evaluate how auditing affects predictive behavior, calibration, and safety-relevant trade-offs. Results show that audited models exhibit more conservative and better-calibrated probability estimates, with reduced reliance on discharge-related lexical cues. These findings emphasize that deployment-ready clinical NLP systems should prioritize temporal validity, calibration, and behavioral robustness over optimistic performance.
Related papers
- An Empirical Analysis of Calibration and Selective Prediction in Multimodal Clinical Condition Classification [11.640422721732756]
We empirically evaluate the reliability of uncertainty-based selective prediction in multilabel clinical condition classification.<n>We find that selective prediction can substantially degrade performance despite strong standard evaluation metrics.<n>This failure is driven by severe class-dependent miscalibration, whereby models assign high uncertainty to correct predictions and low uncertainty to incorrect ones.
arXiv Detail & Related papers (2026-03-03T08:16:44Z) - Calibrated Bayesian Deep Learning for Explainable Decision Support Systems Based on Medical Imaging [6.826979426009301]
It is imperative that models quantify uncertainty in a manner that correlates with prediction correctness, allowing clinicians to identify unreliable outputs for further review.<n>The present paper proposes a generalizable probabilistic optimization framework grounded in Bayesian deep learning.<n>Specifically, a novel Confidence-Uncertainty Boundary Loss (CUB-Loss) is introduced that imposes penalties on high-certainty errors and low-certainty correct predictions.<n>The proposed framework is validated on three distinct medical imaging tasks: automatic screening of pneumonia, diabetic retinopathy detection, and identification of skin lesions.
arXiv Detail & Related papers (2026-02-12T14:03:41Z) - Stable Prediction of Adverse Events in Medical Time-Series Data [18.92202613147342]
Early event prediction (EEP) systems continuously estimate a patient's imminent risk to support clinical decision-making.<n>We introduce CAREBench, an EEP benchmark that evaluates deployability using multi-modal inputs-tabular EHR, ECG waveforms, and clinical text.<n>We propose a stability metric that quantifies short-term variability in per-patient risk and penalizes abrupt oscillations based on local-Lipschitz constants.
arXiv Detail & Related papers (2025-10-16T04:16:54Z) - Evidential time-to-event prediction with calibrated uncertainty quantification [12.446406577462069]
Time-to-event analysis provides insights into clinical prognosis and treatment recommendations.<n>We propose an evidential regression model specifically designed for time-to-event prediction.<n>We show that our model delivers both accurate and reliable performance, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2024-11-12T15:06:04Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - Improving Robustness and Reliability in Medical Image Classification with Latent-Guided Diffusion and Nested-Ensembles [4.249986624493547]
Once deployed, medical image analysis methods are often faced with unexpected image corruptions and noise perturbations.<n>LaDiNE is a novel ensemble learning method combining the robustness of Vision Transformers with diffusion-based generative models.<n>Experiments on tuberculosis chest X-rays and melanoma skin cancer datasets demonstrate that LaDiNE achieves superior performance compared to a wide range of state-of-the-art methods.
arXiv Detail & Related papers (2023-10-24T15:53:07Z) - Towards Reliable Medical Image Segmentation by Modeling Evidential Calibrated Uncertainty [57.023423137202485]
Concerns regarding the reliability of medical image segmentation persist among clinicians.<n>We introduce DEviS, an easily implementable foundational model that seamlessly integrates into various medical image segmentation networks.<n>By leveraging subjective logic theory, we explicitly model probability and uncertainty for medical image segmentation.
arXiv Detail & Related papers (2023-01-01T05:02:46Z) - When in Doubt: Neural Non-Parametric Uncertainty Quantification for
Epidemic Forecasting [70.54920804222031]
Most existing forecasting models disregard uncertainty quantification, resulting in mis-calibrated predictions.
Recent works in deep neural models for uncertainty-aware time-series forecasting also have several limitations.
We model the forecasting task as a probabilistic generative process and propose a functional neural process model called EPIFNP.
arXiv Detail & Related papers (2021-06-07T18:31:47Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - Bayesian prognostic covariate adjustment [59.75318183140857]
Historical data about disease outcomes can be integrated into the analysis of clinical trials in many ways.
We build on existing literature that uses prognostic scores from a predictive model to increase the efficiency of treatment effect estimates.
arXiv Detail & Related papers (2020-12-24T05:19:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.