Related papers: Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models

Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models

URL: http://arxiv.org/abs/2509.22121v1
Date: Fri, 26 Sep 2025 09:44:16 GMT
Title: Mind the Missing: Variable-Aware Representation Learning for Irregular EHR Time Series using Large Language Models
Authors: Jeong Eul Kwon, Joo Heung Yoon, Hyo Kyung Lee,
Abstract summary: VITAL is a variable-aware, large language model (LLM) based framework tailored for learning from irregularly sampled physiological time series.<n>It reprograms vital signs into the language space, enabling the LLM to capture temporal context and reason over missing values.<n>It maintains robust performance under high levels of missingness, which is prevalent in real world clinical scenarios.
Score: 0.6554326244334866
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Irregular sampling and high missingness are intrinsic challenges in modeling time series derived from electronic health records (EHRs),where clinical variables are measured at uneven intervals depending on workflow and intervention timing. To address this, we propose VITAL, a variable-aware, large language model (LLM) based framework tailored for learning from irregularly sampled physiological time series. VITAL differentiates between two distinct types of clinical variables: vital signs, which are frequently recorded and exhibit temporal patterns, and laboratory tests, which are measured sporadically and lack temporal structure. It reprograms vital signs into the language space, enabling the LLM to capture temporal context and reason over missing values through explicit encoding. In contrast, laboratory variables are embedded either using representative summary values or a learnable [Not measured] token, depending on their availability. Extensive evaluations on the benchmark datasets from the PhysioNet demonstrate that VITAL outperforms state of the art methods designed for irregular time series. Furthermore, it maintains robust performance under high levels of missingness, which is prevalent in real world clinical scenarios where key variables are often unavailable.

Related papers

Structured Temporal Causality for Interpretable Multivariate Time Series Anomaly Detection [1.6111818380407035]
OracleAD is an unsupervised framework for time series anomaly detection.<n>Anomalies are identified using a dual scoring mechanism based on prediction error and deviation from the Stable Latent Structure.<n>OracleAD achieves state-of-the-art results across multiple real-world datasets and evaluation protocols.
arXiv Detail & Related papers (2025-10-18T13:53:41Z)
ProMedTS: A Self-Supervised, Prompt-Guided Multimodal Approach for Integrating Medical Text and Time Series [27.70300880284899]
Large language models (LLMs) have shown remarkable performance in vision-grained tasks, but their application in the medical field remains underexplored.<n>We introduce ProMedTS, a novel self-supervised multimodal framework that employs prompt-guided learning to unify data types.<n>We evaluate ProMedTS on disease diagnosis tasks using real-world datasets, and the results demonstrate that our method consistently outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-19T07:56:48Z)
CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis [50.56875995511431]
We introduce a Cross-Modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data.<n>Our approach introduces shared initial temporal pattern representations which are refined using slot attention to generate temporal semantic embeddings.
arXiv Detail & Related papers (2024-11-01T15:54:07Z)
EMIT- Event-Based Masked Auto Encoding for Irregular Time Series [9.903108445512576]
Irregular time series, where data points are recorded at uneven intervals, are prevalent in healthcare settings. This variability, which reflects critical fluctuations in patient health, is essential for informed clinical decision-making. Existing self-supervised learning research on irregular time series often relies on generic pretext tasks like forecasting. This paper proposes a novel pretraining framework, EMIT, an event-based masking for irregular time series.
arXiv Detail & Related papers (2024-09-25T02:05:32Z)
Scalable Numerical Embeddings for Multivariate Time Series: Enhancing Healthcare Data Representation Learning [6.635084843592727]
We propose SCAlable Numerical Embedding (SCANE), a novel framework that treats each feature value as an independent token. SCANE regularizes the traits of distinct feature embeddings and enhances representational learning through a scalable embedding mechanism. We develop the nUMerical eMbeddIng Transformer (SUMMIT), which is engineered to deliver precise predictive outputs for MTS characterized by prevalent missing entries.
arXiv Detail & Related papers (2024-05-26T13:06:45Z)
Graph Spatiotemporal Process for Multivariate Time Series Anomaly Detection with Missing Values [67.76168547245237]
We introduce a novel framework called GST-Pro, which utilizes a graphtemporal process and anomaly scorer to detect anomalies. Our experimental results show that the GST-Pro method can effectively detect anomalies in time series data and outperforms state-of-the-art methods.
arXiv Detail & Related papers (2024-01-11T10:10:16Z)
Time-LLM: Time Series Forecasting by Reprogramming Large Language Models [110.20279343734548]
Time series forecasting holds significant importance in many real-world dynamic systems. We present Time-LLM, a reprogramming framework to repurpose large language models for time series forecasting. Time-LLM is a powerful time series learner that outperforms state-of-the-art, specialized forecasting models.
arXiv Detail & Related papers (2023-10-03T01:31:25Z)
Sequential Multi-Dimensional Self-Supervised Learning for Clinical Time Series [3.635056427544418]
We propose a new self-supervised learning method for clinical time series data. Our method is agnostic to the specific form of loss function used at each level. We evaluate our method on two real-world clinical datasets.
arXiv Detail & Related papers (2023-07-20T14:49:58Z)
Correlation-aware Spatial-Temporal Graph Learning for Multivariate Time-series Anomaly Detection [67.60791405198063]
We propose a correlation-aware spatial-temporal graph learning (termed CST-GL) for time series anomaly detection. CST-GL explicitly captures the pairwise correlations via a multivariate time series correlation learning module. A novel anomaly scoring component is further integrated into CST-GL to estimate the degree of an anomaly in a purely unsupervised manner.
arXiv Detail & Related papers (2023-07-17T11:04:27Z)
T-Phenotype: Discovering Phenotypes of Predictive Temporal Patterns in Disease Progression [82.85825388788567]
We develop a novel temporal clustering method, T-Phenotype, to discover phenotypes of predictive temporal patterns from labeled time-series data. We show that T-Phenotype achieves the best phenotype discovery performance over all the evaluated baselines.
arXiv Detail & Related papers (2023-02-24T13:30:35Z)
Continuous-Time Modeling of Counterfactual Outcomes Using Neural Controlled Differential Equations [84.42837346400151]
Estimating counterfactual outcomes over time has the potential to unlock personalized healthcare. Existing causal inference approaches consider regular, discrete-time intervals between observations and treatment decisions. We propose a controllable simulation environment based on a model of tumor growth for a range of scenarios.
arXiv Detail & Related papers (2022-06-16T17:15:15Z)
Self-supervised Transformer for Multivariate Clinical Time-Series with Missing Values [7.9405251142099464]
We present STraTS (Self-supervised Transformer for TimeSeries) model. It treats time-series as a set of observation triplets instead of using the traditional dense matrix representation. It shows better prediction performance than state-of-theart methods for mortality prediction, especially when labeled data is limited.
arXiv Detail & Related papers (2021-07-29T19:39:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.