Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models
- URL: http://arxiv.org/abs/2509.22121v1
- Date: Fri, 26 Sep 2025 09:44:16 GMT
- Title: Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models
- Authors: Jeong Eul Kwon, Joo Heung Yoon, Hyo Kyung Lee,
- Abstract summary: VITAL is a variable-aware, large language model (LLM) based framework tailored for learning from irregularly sampled physiological time series.<n>It reprograms vital signs into the language space, enabling the LLM to capture temporal context and reason over missing values.<n>It maintains robust performance under high levels of missingness, which is prevalent in real world clinical scenarios.
- Score: 0.6554326244334866
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Irregular sampling and high missingness are intrinsic challenges in modeling time series derived from electronic health records (EHRs),where clinical variables are measured at uneven intervals depending on workflow and intervention timing. To address this, we propose VITAL, a variable-aware, large language model (LLM) based framework tailored for learning from irregularly sampled physiological time series. VITAL differentiates between two distinct types of clinical variables: vital signs, which are frequently recorded and exhibit temporal patterns, and laboratory tests, which are measured sporadically and lack temporal structure. It reprograms vital signs into the language space, enabling the LLM to capture temporal context and reason over missing values through explicit encoding. In contrast, laboratory variables are embedded either using representative summary values or a learnable [Not measured] token, depending on their availability. Extensive evaluations on the benchmark datasets from the PhysioNet demonstrate that VITAL outperforms state of the art methods designed for irregular time series. Furthermore, it maintains robust performance under high levels of missingness, which is prevalent in real world clinical scenarios where key variables are often unavailable.
Related papers
- Structured Temporal Causality for Interpretable Multivariate Time Series Anomaly Detection [1.6111818380407035]
OracleAD is an unsupervised framework for time series anomaly detection.<n>Anomalies are identified using a dual scoring mechanism based on prediction error and deviation from the Stable Latent Structure.<n>OracleAD achieves state-of-the-art results across multiple real-world datasets and evaluation protocols.
arXiv Detail & Related papers (2025-10-18T13:53:41Z) - ProMedTS: A Self-Supervised, Prompt-Guided Multimodal Approach for Integrating Medical Text and Time Series [27.70300880284899]
Large language models (LLMs) have shown remarkable performance in vision-grained tasks, but their application in the medical field remains underexplored.<n>We introduce ProMedTS, a novel self-supervised multimodal framework that employs prompt-guided learning to unify data types.<n>We evaluate ProMedTS on disease diagnosis tasks using real-world datasets, and the results demonstrate that our method consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-19T07:56:48Z) - CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis [50.56875995511431]
We introduce a Cross-Modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data.<n>Our approach introduces shared initial temporal pattern representations which are refined using slot attention to generate temporal semantic embeddings.
arXiv Detail & Related papers (2024-11-01T15:54:07Z) - EMIT- Event-Based Masked Auto Encoding for Irregular Time Series [9.903108445512576]
Irregular time series, where data points are recorded at uneven intervals, are prevalent in healthcare settings.
This variability, which reflects critical fluctuations in patient health, is essential for informed clinical decision-making.
Existing self-supervised learning research on irregular time series often relies on generic pretext tasks like forecasting.
This paper proposes a novel pretraining framework, EMIT, an event-based masking for irregular time series.
arXiv Detail & Related papers (2024-09-25T02:05:32Z) - Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning [6.635084843592727]
We propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token.
SCANE regularizes the traits of distinct feature embeddings and enhances representational learning through a scalable embedding mechanism.
We develop the nUMerical eMbeddIng Transformer (SUMMIT), which is engineered to deliver precise predictive outputs for MTS characterized by prevalent missing entries.
arXiv Detail & Related papers (2024-05-26T13:06:45Z) - Graph Spatiotemporal Process for Multivariate Time Series Anomaly
Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies.
Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z) - Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems.
We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting.
Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z) - Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time
Series [3.635056427544418]
We propose a new self-supervised learning method for clinical time series data.
Our method is agnostic to the specific form of loss function used at each level.
We evaluate our method on two real-world clinical datasets.
arXiv Detail & Related papers (2023-07-20T14:49:58Z) - Correlation-aware Spatial-Temporal Graph Learning for Multivariate
Time-series Anomaly Detection [67.60791405198063]
We propose a correlation-aware spatial-temporal graph learning (termed CST-GL) for time series anomaly detection.
CST-GL explicitly captures the pairwise correlations via a multivariate time series correlation learning module.
A novel anomaly scoring component is further integrated into CST-GL to estimate the degree of an anomaly in a purely unsupervised manner.
arXiv Detail & Related papers (2023-07-17T11:04:27Z) - T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in
Disease Progression [82.85825388788567]
We develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data.
We show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines.
arXiv Detail & Related papers (2023-02-24T13:30:35Z) - Continuous-Time Modeling of Counterfactual Outcomes Using Neural
Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare.
Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions.
We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z) - Self-supervised Transformer for Multivariate Clinical Time-Series with
Missing Values [7.9405251142099464]
We present STraTS (Self-supervised Transformer for TimeSeries) model.
It treats time-series as a set of observation triplets instead of using the traditional dense matrix representation.
It shows better prediction performance than state-of-theart methods for mortality prediction, especially when labeled data is limited.
arXiv Detail & Related papers (2021-07-29T19:39:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.