Related papers: One Loss to Rule Them All: Marked Time-to-Event for Structured EHR Foundation Models

One Loss to Rule Them All: Marked Time-to-Event for Structured EHR Foundation Models

URL: http://arxiv.org/abs/2602.00541v1
Date: Sat, 31 Jan 2026 06:15:46 GMT
Title: One Loss to Rule Them All: Marked Time-to-Event for Structured EHR Foundation Models
Authors: Zilin Jing, Vincent Jeanselme, Yuta Kobayashi, Simon A. Lee, Chao Pang, Aparajita Kashyap, Yanwei Li, Xinzhuo Jiang, Shalmali Joshi,
Abstract summary: We propose ORA, a time-to-event pretraining objective that jointly models event timing and associated measurements.<n>Our results suggest a broader takeaway: pretraining objectives that account for EHR structure are critical for expanding downstream capabilities and generalizability.
Score: 12.630229861635476
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Clinical events captured in Electronic Health Records (EHR) are irregularly sampled and may consist of a mixture of discrete events and numerical measurements, such as laboratory values or treatment dosages. The sequential nature of EHR, analogous to natural language, has motivated the use of next-token prediction to train prior EHR Foundation Models (FMs) over events. However, this training fails to capture the full structure of EHR. We propose ORA, a marked time-to-event pretraining objective that jointly models event timing and associated measurements. Across multiple datasets, downstream tasks, and model architectures, this objective consistently yields more generalizable representations than next-token prediction and pretraining losses that ignore continuous measurements. Importantly, the proposed objective yields improvements beyond traditional classification evaluation, including better regression and time-to-event prediction. Beyond introducing a new family of FMs, our results suggest a broader takeaway: pretraining objectives that account for EHR structure are critical for expanding downstream capabilities and generalizability

Related papers

Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling [51.78972657142583]
We introduce Timer-S1, a strong Mixture-of-Experts (MoE) time series foundation model with 8.3B total parameters, 0.75B activated parameters for each token, and a context length of 11.5K.<n>To overcome the scalability bottleneck in existing pre-trained time series foundation models, we perform Serial Scaling in three dimensions.
arXiv Detail & Related papers (2026-03-05T04:13:57Z)
A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting [81.73338008264115]
Current approaches for time series forecasting, whether in the time or frequency domain, predominantly use deep learning models based on linear layers or transformers.<n>We propose FIRE, a unified frequency domain decomposition framework that provides a mathematical abstraction for diverse types of time series.<n>Fire consistently outperforms state-of-the-art models on long-term forecasting benchmarks.
arXiv Detail & Related papers (2025-10-11T09:59:25Z)
Building the EHR Foundation Model via Next Event Prediction [5.378917071184147]
Next Event Prediction (NEP) is a framework that enhances Large Language Models' temporal reasoning.<n>NEP explicitly models disease progression patterns and causal relationships.<n>Our analyses reveal dual benefits: state-of-the-art prediction accuracy combined with clinically interpretable attention patterns.
arXiv Detail & Related papers (2025-09-29T23:27:51Z)
Foundation Models for Clinical Records at Health System Scale [40.88151645546234]
We present a novel generative pretraining strategy for sequential EHR data using next-visit event prediction.<n>Our model learns to autoregressively generate various tokenized clinical events for the next visit based on patient history.
arXiv Detail & Related papers (2025-07-01T08:52:33Z)
Towards Data-Efficient Pretraining for Atomic Property Prediction [51.660835328611626]
We show that pretraining on a task-relevant dataset can match or surpass large-scale pretraining.<n>We introduce the Chemical Similarity Index (CSI), a novel metric inspired by computer vision's Fr'echet Inception Distance.
arXiv Detail & Related papers (2025-02-16T11:46:23Z)
Beyond Scaling: Measuring and Predicting the Upper Bound of Knowledge Retention in Language Model Pre-Training [68.94373533768501]
We model knowledge retention, the capacity of a pre-trained language model to memorize factual information from its corpus, and introduce a principled method to estimate it prior to training.<n>We propose Size-dependent Mutual Information (SMI), an information-theoretic predictor that integrates knowledge frequency, knowledge specificity, and model size to forecast closed-book question answering (QA) accuracy.
arXiv Detail & Related papers (2025-02-06T13:23:53Z)
Context Clues: Evaluating Long Context Models for Clinical Prediction Tasks on EHRs [20.08838976147805]
We present the first systematic evaluation of the effect of context length on modeling EHR data.<n>We find that longer context models improve predictive performance.<n>For clinical applications, however, model performance alone is insufficient.
arXiv Detail & Related papers (2024-12-09T21:58:27Z)
Evidential time-to-event prediction with calibrated uncertainty quantification [12.446406577462069]
Time-to-event analysis provides insights into clinical prognosis and treatment recommendations.<n>We propose an evidential regression model specifically designed for time-to-event prediction.<n>We show that our model delivers both accurate and reliable performance, outperforming state-of-the-art methods.
arXiv Detail & Related papers (2024-11-12T15:06:04Z)
Impact of Noisy Supervision in Foundation Model Learning [91.56591923244943]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.<n>We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Examining the Effect of Pre-training on Time Series Classification [21.38211396933795]
This study investigates the impact of pre-training followed by fine-tuning on the fine-tuning process. We conducted a thorough examination of 150 classification datasets. We find that pre-training can only help improve the optimization process for models that fit the data poorly. Adding more pre-training data does not improve generalization, but it can strengthen the advantage of pre-training on the original data volume.
arXiv Detail & Related papers (2023-09-11T06:26:57Z)
Towards Out-of-Distribution Sequential Event Prediction: A Causal Treatment [72.50906475214457]
The goal of sequential event prediction is to estimate the next event based on a sequence of historical events. In practice, the next-event prediction models are trained with sequential data collected at one time. We propose a framework with hierarchical branching structures for learning context-specific representations.
arXiv Detail & Related papers (2022-10-24T07:54:13Z)
Multi-axis Attentive Prediction for Sparse EventData: An Application to Crime Prediction [16.654369376687296]
We present a purely attentional approach to extract both short-term dynamics and long-term semantics of event propagation through two observation angles. The proposed contrastive learning objective significantly enhances the MAPSED's ability to capture semantics and dynamics of events.
arXiv Detail & Related papers (2021-10-05T02:38:46Z)
SANSformers: Self-Supervised Forecasting in Electronic Health Records with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities. We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data. Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z)
Improving Event Duration Prediction via Time-aware Pre-training [90.74988936678723]
We introduce two effective models for duration prediction. One model predicts the range/unit where the duration value falls in (R-pred); and the other predicts the exact duration value E-pred. Our best model -- E-pred, substantially outperforms previous work, and captures duration information more accurately than R-pred.
arXiv Detail & Related papers (2020-11-05T01:52:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.