Related papers: PRISM: Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration for EHR Data Sparsity Mitigation

PRISM: Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration for EHR Data Sparsity Mitigation

URL: http://arxiv.org/abs/2309.04160v5
Date: Mon, 27 May 2024 10:44:17 GMT
Title: PRISM: Leveraging Prototype Patient Representations with Feature-Missing-Aware Calibration for EHR Data Sparsity Mitigation
Authors: Yinghao Zhu, Zixiang Wang, Long He, Shiyun Xie, Xiaochen Zheng, Liantao Ma, Chengwei Pan,
Abstract summary: PRISM is a framework that indirectly imputes data by leveraging prototype representations of similar patients. PRISM also includes a feature confidence module, which evaluates the reliability of each feature considering missing statuses.
Score: 7.075420686441701
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Electronic Health Records (EHRs) contain a wealth of patient data; however, the sparsity of EHRs data often presents significant challenges for predictive modeling. Conventional imputation methods inadequately distinguish between real and imputed data, leading to potential inaccuracies of patient representations. To address these issues, we introduce PRISM, a framework that indirectly imputes data by leveraging prototype representations of similar patients, thus ensuring compact representations that preserve patient information. PRISM also includes a feature confidence learner module, which evaluates the reliability of each feature considering missing statuses. Additionally, PRISM introduces a new patient similarity metric that accounts for feature confidence, avoiding overreliance on imprecise imputed values. Our extensive experiments on the MIMIC-III, MIMIC-IV, PhysioNet Challenge 2012, eICU datasets demonstrate PRISM's superior performance in predicting in-hospital mortality and 30-day readmission tasks, showcasing its effectiveness in handling EHR data sparsity. For the sake of reproducibility and further research, we have made the code publicly available at https://github.com/yhzhu99/PRISM.

Related papers

Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes [55.310195121276074]
We propose a Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework to predict diseases.<n>It integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy.<n>It provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.
arXiv Detail & Related papers (2025-12-09T05:37:54Z)
Semi-supervised Clustering Through Representation Learning of Large-scale EHR Data [5.591260685112265]
SCORE is a semi-supervised representation learning framework that captures multi-domain disease profiles through patient embeddings.<n>To handle the computational challenges of large-scale data, it introduces a hybrid Expectation-Maximization (EM) and Gaussian Variational Approximation (GVA) algorithm.<n>Our analysis shows that incorporating unlabeled data enhances accuracy and reduces sensitivity to label scarcity.
arXiv Detail & Related papers (2025-05-27T05:20:17Z)
Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD. It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation. We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z)
SMART: Towards Pre-trained Missing-Aware Model for Patient Health Status Prediction [15.136747790595217]
We propose a Self-Supervised Missing-Aware RepresenTation Learning approach for patient health status prediction. By adopting missing-aware attentions and focusing on learning higher-order representations, SMART promotes better generalization and robustness to missing data. We validate the effectiveness of SMART through extensive experiments on six EHR tasks, demonstrating its superiority over state-of-the-art methods.
arXiv Detail & Related papers (2024-05-15T02:19:34Z)
Differentially Private Distributed Inference [2.4401219403555814]
Healthcare centers collaborating on clinical trials must balance knowledge sharing with safeguarding sensitive patient data. We address this challenge by using differential privacy (DP) to control information leakage. Agents update belief statistics via log-linear rules, and DP noise provides plausible deniability and rigorous performance guarantees.
arXiv Detail & Related papers (2024-02-13T01:38:01Z)
Learnable Prompt as Pseudo-Imputation: Reassessing the Necessity of Traditional EHR Data Imputation in Downstream Clinical Prediction [16.638760651750744]
Existing deep learning training protocols require the use of statistical information or imputation models to reconstruct missing values. This paper introduces Learnable Prompt as Pseudo Imputation (PAI) as a new training protocol. PAI no longer introduces any imputed data but constructs a learnable prompt to model the implicit preferences of the downstream model for missing values.
arXiv Detail & Related papers (2024-01-30T07:19:36Z)
IGNITE: Individualized GeNeration of Imputations in Time-series Electronic health records [7.451873794596469]
We propose a novel deep-learning model that learns the underlying patient dynamics to generate personalized values conditioning on an individual's demographic characteristics and treatments. Our proposed model, IGNITE, utilise a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual. We show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.
arXiv Detail & Related papers (2024-01-09T07:57:21Z)
MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion. It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space. It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z)
Large Language Models for Healthcare Data Augmentation: An Example on Patient-Trial Matching [49.78442796596806]
We propose an innovative privacy-aware data augmentation approach for patient-trial matching (LLM-PTM) Our experiments demonstrate a 7.32% average improvement in performance using the proposed LLM-PTM method, and the generalizability to new data is improved by 12.12%.
arXiv Detail & Related papers (2023-03-24T03:14:00Z)
Sequential Diagnosis Prediction with Transformer and Ontological Representation [35.88195694025553]
We propose an end-to-end robust transformer-based model called SETOR to handle irregular intervals between a patient's visits with admitted timestamps and length of stay in each visit. Experiments conducted on two real-world healthcare datasets show that, our sequential diagnoses prediction model SETOR achieves better predictive results than previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-09-07T13:09:55Z)
SANSformers: Self-Supervised Forecasting in Electronic Health Records with Attention-Free Models [48.07469930813923]
This work aims to forecast the demand for healthcare services, by predicting the number of patient visits to healthcare facilities. We introduce SANSformer, an attention-free sequential model designed with specific inductive biases to cater for the unique characteristics of EHR data. Our results illuminate the promising potential of tailored attention-free models and self-supervised pretraining in refining healthcare utilization predictions across various patient demographics.
arXiv Detail & Related papers (2021-08-31T08:23:56Z)
UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model. UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data. We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD) UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z)
BiteNet: Bidirectional Temporal Encoder Network to Predict Medical Outcomes [53.163089893876645]
We propose a novel self-attention mechanism that captures the contextual dependency and temporal relationships within a patient's healthcare journey. An end-to-end bidirectional temporal encoder network (BiteNet) then learns representations of the patient's journeys. We have evaluated the effectiveness of our methods on two supervised prediction and two unsupervised clustering tasks with a real-world EHR dataset.
arXiv Detail & Related papers (2020-09-24T00:42:36Z)
Hemogram Data as a Tool for Decision-making in COVID-19 Management: Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure. This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients. Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.