Deep Stable Representation Learning on Electronic Health Records
- URL: http://arxiv.org/abs/2209.01321v1
- Date: Sat, 3 Sep 2022 04:10:45 GMT
- Title: Deep Stable Representation Learning on Electronic Health Records
- Authors: Yingtao Luo, Zhaocheng Liu and Qiang Liu
- Abstract summary: Causal Healthcare Embedding (CHE) aims at eliminating the spurious statistical relationship by removing the dependencies between diagnoses and procedures.
Our proposed CHE method can be used as a flexible plug-and-play module that can enhance existing deep learning models on EHR.
- Score: 8.256340233221112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have achieved promising disease prediction performance
of the Electronic Health Records (EHR) of patients. However, most models
developed under the I.I.D. hypothesis fail to consider the agnostic
distribution shifts, diminishing the generalization ability of deep learning
models to Out-Of-Distribution (OOD) data. In this setting, spurious statistical
correlations that may change in different environments will be exploited, which
can cause sub-optimal performances of deep learning models. The unstable
correlation between procedures and diagnoses existed in the training
distribution can cause spurious correlation between historical EHR and future
diagnosis. To address this problem, we propose to use a causal representation
learning method called Causal Healthcare Embedding (CHE). CHE aims at
eliminating the spurious statistical relationship by removing the dependencies
between diagnoses and procedures. We introduce the Hilbert-Schmidt Independence
Criterion (HSIC) to measure the degree of independence between the embedded
diagnosis and procedure features. Based on causal view analyses, we perform the
sample weighting technique to get rid of such spurious relationship for the
stable learning of EHR across different environments. Moreover, our proposed
CHE method can be used as a flexible plug-and-play module that can enhance
existing deep learning models on EHR. Extensive experiments on two public
datasets and five state-of-the-art baselines unequivocally show that CHE can
improve the prediction accuracy of deep learning models on out-of-distribution
data by a large margin. In addition, the interpretability study shows that CHE
could successfully leverage causal structures to reflect a more reasonable
contribution of historical records for predictions.
Related papers
- Considerations for Distribution Shift Robustness of Diagnostic Models in Healthcare [10.393967785465536]
In the domain of applied ML for health, it is common to predict $Y$ from $X$ without considering further information about the patient.
In this work, we highlight a data generating mechanism common to healthcare settings and discuss how recent theoretical results from the causality literature can be applied to build robust predictive models.
arXiv Detail & Related papers (2024-10-25T14:13:09Z) - Towards Interpretable End-Stage Renal Disease (ESRD) Prediction: Utilizing Administrative Claims Data with Explainable AI Techniques [6.417777780911223]
This study explores the potential of utilizing administrative claims data, combined with advanced machine learning and deep learning techniques, to predict the progression of Chronic Kidney Disease (CKD) to End-Stage Renal Disease (ESRD)
We analyze a comprehensive, 10-year dataset provided by a major health insurance organization to develop prediction models for multiple observation windows using traditional machine learning methods such as Random Forest and XGBoost as well as deep learning approaches such as Long Short-Term Memory (LSTM) networks.
Our findings demonstrate that the LSTM model, particularly with a 24-month observation window, exhibits superior performance in predicting ESRD progression,
arXiv Detail & Related papers (2024-09-18T16:03:57Z) - Deep State-Space Generative Model For Correlated Time-to-Event Predictions [54.3637600983898]
We propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events.
Our method also uncovers meaningful insights about the latent correlations among mortality and different types of organ failures.
arXiv Detail & Related papers (2024-07-28T02:42:36Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Learnable Prompt as Pseudo-Imputation: Reassessing the Necessity of
Traditional EHR Data Imputation in Downstream Clinical Prediction [16.638760651750744]
Existing deep learning training protocols require the use of statistical information or imputation models to reconstruct missing values.
This paper introduces Learnable Prompt as Pseudo Imputation (PAI) as a new training protocol.
PAI no longer introduces any imputed data but constructs a learnable prompt to model the implicit preferences of the downstream model for missing values.
arXiv Detail & Related papers (2024-01-30T07:19:36Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Bidirectional Representation Learning from Transformers using Multimodal
Electronic Health Record Data to Predict Depression [11.1492931066686]
We present a temporal deep learning model to perform bidirectional representation learning on EHR sequences to predict depression.
The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model.
arXiv Detail & Related papers (2020-09-26T17:56:37Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.