Generative Foundation Model for Structured and Unstructured Electronic Health Records
- URL: http://arxiv.org/abs/2508.16054v1
- Date: Fri, 22 Aug 2025 03:05:09 GMT
- Title: Generative Foundation Model for Structured and Unstructured Electronic Health Records
- Authors: Sonish Sivarajkumar, Hang Zhang, Yuelyu Ji, Maneesh Bilalpur, Xizhi Wu, Chenyu Li, Min Gu Kwak, Shyam Visweswaran, Yanshan Wang,
- Abstract summary: Generative Deep Patient (GDP) is a multimodal foundation model that encodes structured EHR time-series via a CNN-Transformer encoder and fuses it with unstructured EHRs through cross-modal attention into a LLaMA-based decoder.<n>GDP demonstrated superior performance on MIMIC-IV: heart failure AUROC = 0.923, type 2 diabetes AUROC = 0.817, and 30-day readmission AUROC = 0.627.
- Score: 10.687198380096314
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Electronic health records (EHRs) are rich clinical data sources but complex repositories of patient data, spanning structured elements (demographics, vitals, lab results, codes), unstructured clinical notes and other modalities of data. Harnessing this heterogeneity is critical for improving patient outcomes. Recent advances in large language models (LLMs) have enabled foundation models that can learn from multiple data modalities and support clinical tasks. However, most current approaches simply serialize numeric EHR data into text, which risks losing temporal and quantitative detail. We introduce Generative Deep Patient (GDP), a multimodal foundation model that natively encodes structured EHR time-series via a CNN-Transformer encoder and fuses it with unstructured EHRs through cross-modal attention into a LLaMA-based decoder. GDP is trained in two stages: (1) generative pretraining, where it learns to produce clinical narratives from raw patient timelines while also performing masked feature prediction (MFP) and next time-step prediction (NTP) to capture temporal dynamics; and (2) multi-task fine-tuning for clinically meaningful predictions (e.g., heart failure, type 2 diabetes, 30-day readmission). In clinical prediction, GDP demonstrated superior performance on MIMIC-IV: heart failure AUROC = 0.923, type 2 diabetes AUROC = 0.817, and 30-day readmission AUROC = 0.627. For narrative generation, GDP achieved ROUGE-L = 0.135 and BERTScore-F1 = 0.545. In a blinded human evaluation, GDP-Instruct scored highest on faithfulness, fluency, and overall clinical utility, suggesting reduced hospital documentation workload without sacrificing accuracy. Our results demonstrate that a single multimodal foundation model can both predict clinically actionable events and generate high-quality clinical narratives. Furthermore, GDP's flexible architecture can be extended to additional modalities.
Related papers
- CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction [24.569877750738286]
We present CURENet, a multimodal model that integrates unstructured clinical notes, lab tests, and patients' time-series data.<n>CURENet has been capable of capturing the intricate interaction between different forms of clinical data and creating a more reliable predictive model for chronic illnesses.
arXiv Detail & Related papers (2025-11-14T15:52:22Z) - Integrating Genomics into Multimodal EHR Foundation Models [56.31910745104141]
This paper introduces an innovative EHR foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality.<n>The framework aims to learn complex relationships between clinical data and genetic predispositions.<n>This approach is pivotal for unlocking new insights into disease prediction, proactive health management, risk stratification, and personalized treatment strategies.
arXiv Detail & Related papers (2025-10-24T15:56:40Z) - Enhancing mortality prediction in cardiac arrest ICU patients through meta-modeling of structured clinical data from MIMIC-IV [0.0]
This study develops and evaluates machine learning models that integrate structured clinical data and unstructured information.<n>We used LASSO and XGBoost for feature selection, followed by a logistic regression trained on the top features identified by both models.<n>The final logistic regression model, which combined structured and textual input, achieved an AUC of 0.918, compared to 0.753 when using structured data alone, a relative improvement 22%.
arXiv Detail & Related papers (2025-10-20T20:56:45Z) - Improving Hospital Risk Prediction with Knowledge-Augmented Multimodal EHR Modeling [14.3674176608249]
We introduce a unified framework that seamlessly integrates structured and unstructured data for clinical risk prediction.<n>A fine-tuned Large Language Model (LLM) extracts task-relevant information from clinical notes.<n>The second stage combines both unstructured representations and features derived from the structured data to generate the final predictions.
arXiv Detail & Related papers (2025-08-04T01:03:16Z) - Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - EMERGE: Enhancing Multimodal Electronic Health Records Predictive Modeling with Retrieval-Augmented Generation [22.94521527609479]
EMERGE is a Retrieval-Augmented Generation (RAG) driven framework to enhance multimodal EHR predictive modeling.<n>We extract entities from time-series data and clinical notes by prompting Large Language Models (LLMs) and align them with professional PrimeKG.<n>The extracted knowledge is then used to generate task-relevant summaries of patients' health statuses.
arXiv Detail & Related papers (2024-05-27T10:53:15Z) - TRIALSCOPE: A Unifying Causal Framework for Scaling Real-World Evidence Generation with Biomedical Language Models [21.437563965711004]
We present TRIALSCOPE, a framework designed to generate robust real-world evidence from observational data at scale.<n>The framework was shown to automatically curate high-quality structured patient data, expanding the dataset and incorporating key patient attributes only available in unstructured form.<n>We were also able to show that TRIALSCOPE could reproduce results of lung and pancreatic cancer clinical trials from the extracted real world data.
arXiv Detail & Related papers (2023-11-02T15:15:47Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Foresight -- Deep Generative Modelling of Patient Timelines using
Electronic Health Records [46.024501445093755]
Temporal modelling of medical history can be used to forecast and simulate future events, estimate risk, suggest alternative diagnoses or forecast complications.
We present Foresight, a novel GPT3-based pipeline that uses NER+L tools (i.e. MedCAT) to convert document text into structured, coded concepts.
arXiv Detail & Related papers (2022-12-13T19:06:00Z) - SANSformers: Self-Supervised Forecasting in Electronic Health Records
with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities.
We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - BiteNet: Bidirectional Temporal Encoder Network to Predict Medical
Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey.
An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys.
We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z) - Med7: a transferable clinical natural language processing model for
electronic health records [6.935142529928062]
We introduce a named-entity recognition model for clinical natural language processing.
The model is trained to recognise seven categories: drug names, route, frequency, dosage, strength, form, duration.
We evaluate the transferability of the developed model using the data from the Intensive Care Unit in the US to secondary care mental health records (CRIS) in the UK.
arXiv Detail & Related papers (2020-03-03T00:55:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.