Foundation Model of Electronic Medical Records for Adaptive Risk Estimation
- URL: http://arxiv.org/abs/2502.06124v4
- Date: Tue, 05 Aug 2025 18:10:15 GMT
- Title: Foundation Model of Electronic Medical Records for Adaptive Risk Estimation
- Authors: Pawel Renc, Michal K. Grzeszczyk, Nassim Oufattole, Deirdre Goode, Yugang Jia, Szymon Bieganski, Matthew B. A. McDermott, Jaroslaw Was, Anthony E. Samir, Jonathan W. Cunningham, David W. Bates, Arkadiusz Sitek,
- Abstract summary: ETHOS is a versatile framework for developing a wide range of applications.<n>ARES uses ETHOS to compute dynamic, personalized risk probabilities for clinician-defined critical events.<n>ARES also features a personalized explainability module that highlights key clinical factors influencing risk estimates.
- Score: 6.248030496243407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hospitals struggle to predict critical outcomes. Traditional early warning systems, like NEWS and MEWS, rely on static variables and fixed thresholds, limiting their adaptability, accuracy, and personalization. We previously developed the Enhanced Transformer for Health Outcome Simulation (ETHOS), an AI model that tokenizes patient health timelines (PHTs) from EHRs and uses transformer-based architectures to predict future PHTs. ETHOS is a versatile framework for developing a wide range of applications. In this work, we develop the Adaptive Risk Estimation System (ARES) that leverages ETHOS to compute dynamic, personalized risk probabilities for clinician-defined critical events. ARES also features a personalized explainability module that highlights key clinical factors influencing risk estimates. We evaluated ARES using the MIMIC-IV v2.2 dataset together with its Emergency Department (ED) extension and benchmarked performance against both classical early warning systems and contemporary machine learning models. The entire dataset was tokenized resulting in 285,622 PHTs, comprising over 360 million tokens. ETHOS outperformed benchmark models in predicting hospital admissions, ICU admissions, and prolonged stays, achieving superior AUC scores. Its risk estimates were robust across demographic subgroups, with calibration curves confirming model reliability. The explainability module provided valuable insights into patient-specific risk factors. ARES, powered by ETHOS, advances predictive healthcare AI by delivering dynamic, real-time, personalized risk estimation with patient-specific explainability. Although our results are promising, the clinical impact remains uncertain. Demonstrating ARES's true utility in real-world settings will be the focus of our future work. We release the source code to facilitate future research.
Related papers
- Adaptable Cardiovascular Disease Risk Prediction from Heterogeneous Data using Large Language Models [70.64969663547703]
AdaCVD is an adaptable CVD risk prediction framework built on large language models extensively fine-tuned on over half a million participants from the UK Biobank.<n>It addresses key clinical challenges across three dimensions: it flexibly incorporates comprehensive yet variable patient information; it seamlessly integrates both structured data and unstructured text; and it rapidly adapts to new patient populations using minimal additional data.
arXiv Detail & Related papers (2025-05-30T14:42:02Z) - Evaluation of the impact of expert knowledge: How decision support scores impact the effectiveness of automatic knowledge-driven feature engineering (aKDFE) [0.8272083537040182]
Adverse Drug Events (ADEs) pose significant healthcare challenges, impacting patient safety and costs.
This study evaluates automatic Knowledge-Driven Feature Engineering (aKDFE) for improved ADE prediction from Electronic Health Record (EHR) data.
We investigated how incorporating domain-specific ADE risk scores for prolonged heart QT interval affects prediction performance using EHR data and medication handling events.
arXiv Detail & Related papers (2025-04-08T11:34:38Z) - Machine Learning Solutions Integrated in an IoT Healthcare Platform for Heart Failure Risk Stratification [0.16863755729554883]
The management of chronic Heart Failure (HF) presents significant challenges in modern healthcare.<n>We present a predictive model founded on Machine Learning (ML) techniques to identify patients at HF risk.
arXiv Detail & Related papers (2025-04-07T14:07:05Z) - BioSerenity-E1: a self-supervised EEG model for medical applications [0.0]
BioSerenity-E1 is a family of self-supervised foundation models for clinical EEG applications.<n>It combines spectral tokenization with masked prediction to achieve state-of-the-art performance across relevant diagnostic tasks.
arXiv Detail & Related papers (2025-03-13T13:42:46Z) - Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset [0.47517735516852333]
We leverage a linked EHR dataset to characterize urinary tract infections (UTIs)<n>We introduce a UTI risk estimation framework informed by clinical expertise to estimate UTI risk across individual patient timelines.<n>Our findings reveal differences in clinical and demographic predictors across risk groups.
arXiv Detail & Related papers (2024-11-26T18:10:51Z) - DeLLiriuM: A large language model for delirium prediction in the ICU using structured EHR [1.4699314771635081]
Delirium is an acute confusional state that has been shown to affect up to 31% of patients in the intensive care unit (ICU)
We develop and validate DeLLiriuM on ICU admissions from 104,303 patients pertaining to 195 hospitals across three large databases.
arXiv Detail & Related papers (2024-10-22T18:56:31Z) - AIPatient: Simulating Patients with EHRs and LLM Powered Agentic Workflow [33.8495939261319]
We develop an advanced simulated patient system with AIPatient Knowledge Graph (AIPatient KG) as the input and Reasoning Retrieval-Augmented Generation (Reasoning RAG) as the generation backbone.
Reasoning RAG leverages six LLM powered agents spanning tasks including retrieval, KG query generation, abstraction, checker, rewrite, and summarization.
Our system also presents high readability (median Flesch Reading Ease 77.23; median Flesch Kincaid Grade 5.6), robustness (ANOVA F-value 0.6126, p>0.1), and stability (ANOVA F-value 0.782, p>0.1)
arXiv Detail & Related papers (2024-09-27T17:17:15Z) - SepsisLab: Early Sepsis Prediction with Uncertainty Quantification and Active Sensing [67.8991481023825]
Sepsis is the leading cause of in-hospital mortality in the USA.
Existing predictive models are usually trained on high-quality data with few missing information.
For the potential high-risk patients with low confidence due to limited observations, we propose a robust active sensing algorithm.
arXiv Detail & Related papers (2024-07-24T04:47:36Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Contrastive Learning-based Imputation-Prediction Networks for
In-hospital Mortality Risk Modeling using EHRs [9.578930989075035]
This paper presents a contrastive learning-based imputation-prediction network for predicting in-hospital mortality risks using EHR data.
Our approach introduces graph analysis-based patient stratification modeling in the imputation process to group similar patients.
Experiments on two real-world EHR datasets show that our approach outperforms the state-of-the-art approaches in both imputation and prediction tasks.
arXiv Detail & Related papers (2023-08-19T03:24:34Z) - Neurological Prognostication of Post-Cardiac-Arrest Coma Patients Using
EEG Data: A Dynamic Survival Analysis Framework with Competing Risks [4.487368901635044]
We propose a framework for neurological prognostication of post-cardiac-arrest comatose patients using EEG data.
Our framework uses any dynamic survival analysis model that supports competing risks in the form of estimating patient-level cumulative incidence functions.
We demonstrate our framework by benchmarking three existing dynamic survival analysis models that support competing risks on a real dataset of 922 patients.
arXiv Detail & Related papers (2023-08-17T03:46:23Z) - Clinical Deterioration Prediction in Brazilian Hospitals Based on
Artificial Neural Networks and Tree Decision Models [56.93322937189087]
An extremely boosted neural network (XBNet) is used to predict clinical deterioration (CD)
The XGBoost model obtained the best results in predicting CD among Brazilian hospitals' data.
arXiv Detail & Related papers (2022-12-17T23:29:14Z) - Generating Synthetic Mixed-type Longitudinal Electronic Health Records
for Artificial Intelligent Applications [9.374416143268892]
generative adversarial network (GAN) entitled EHR-M-GAN which synthesizes textitmixed-type timeseries EHR data.
We have validated EHR-M-GAN on three publicly-available intensive care unit databases with records from a total of 141,488 unique patients.
arXiv Detail & Related papers (2021-12-22T17:17:34Z) - SANSformers: Self-Supervised Forecasting in Electronic Health Records
with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities.
We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data.
Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z) - A Knowledge Distillation Ensemble Framework for Predicting Short and
Long-term Hospitalisation Outcomes from Electronic Health Records Data [5.844828229178025]
Existing outcome prediction models suffer from a low recall of infrequent positive outcomes.
We present a highly-scalable and robust machine learning framework to automatically predict adversity represented by mortality and ICU admission.
arXiv Detail & Related papers (2020-11-18T15:56:28Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.