Deep Stable Representation Learning on Electronic Health Records
- URL: http://arxiv.org/abs/2209.01321v1
- Date: Sat, 3 Sep 2022 04:10:45 GMT
- Title: Deep Stable Representation Learning on Electronic Health Records
- Authors: Yingtao Luo, Zhaocheng Liu and Qiang Liu
- Abstract summary: Causal Healthcare Embedding (CHE) aims at eliminating the spurious statistical relationship by removing the dependencies between diagnoses and procedures.
Our proposed CHE method can be used as a flexible plug-and-play module that can enhance existing deep learning models on EHR.
- Score: 8.256340233221112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have achieved promising disease prediction performance
of the Electronic Health Records (EHR) of patients. However, most models
developed under the I.I.D. hypothesis fail to consider the agnostic
distribution shifts, diminishing the generalization ability of deep learning
models to Out-Of-Distribution (OOD) data. In this setting, spurious statistical
correlations that may change in different environments will be exploited, which
can cause sub-optimal performances of deep learning models. The unstable
correlation between procedures and diagnoses existed in the training
distribution can cause spurious correlation between historical EHR and future
diagnosis. To address this problem, we propose to use a causal representation
learning method called Causal Healthcare Embedding (CHE). CHE aims at
eliminating the spurious statistical relationship by removing the dependencies
between diagnoses and procedures. We introduce the Hilbert-Schmidt Independence
Criterion (HSIC) to measure the degree of independence between the embedded
diagnosis and procedure features. Based on causal view analyses, we perform the
sample weighting technique to get rid of such spurious relationship for the
stable learning of EHR across different environments. Moreover, our proposed
CHE method can be used as a flexible plug-and-play module that can enhance
existing deep learning models on EHR. Extensive experiments on two public
datasets and five state-of-the-art baselines unequivocally show that CHE can
improve the prediction accuracy of deep learning models on out-of-distribution
data by a large margin. In addition, the interpretability study shows that CHE
could successfully leverage causal structures to reflect a more reasonable
contribution of historical records for predictions.
Related papers
- Self-Explaining Hypergraph Neural Networks for Diagnosis Prediction [45.89562183034469]
Existing deep learning diagnosis prediction models with intrinsic interpretability often assign attention weights to every past diagnosis or hospital visit.
We introduce SHy, a self-explaining hypergraph neural network model, designed to offer personalized, concise and faithful explanations.
SHy captures higher-order disease interactions and extracts distinct temporal phenotypes as personalized explanations.
arXiv Detail & Related papers (2025-02-15T06:33:02Z) - Testing and Improving the Robustness of Amortized Bayesian Inference for Cognitive Models [0.5223954072121659]
Contaminant observations and outliers often cause problems when estimating the parameters of cognitive models.
In this study, we test and improve the robustness of parameter estimation using amortized Bayesian inference.
The proposed method is straightforward and practical to implement and has a broad applicability in fields where outlier detection or removal is challenging.
arXiv Detail & Related papers (2024-12-29T21:22:24Z) - Deep State-Space Generative Model For Correlated Time-to-Event Predictions [54.3637600983898]
We propose a deep latent state-space generative model to capture the interactions among different types of correlated clinical events.
Our method also uncovers meaningful insights about the latent correlations among mortality and different types of organ failures.
arXiv Detail & Related papers (2024-07-28T02:42:36Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Learnable Prompt as Pseudo-Imputation: Reassessing the Necessity of
Traditional EHR Data Imputation in Downstream Clinical Prediction [16.638760651750744]
Existing deep learning training protocols require the use of statistical information or imputation models to reconstruct missing values.
This paper introduces Learnable Prompt as Pseudo Imputation (PAI) as a new training protocol.
PAI no longer introduces any imputed data but constructs a learnable prompt to model the implicit preferences of the downstream model for missing values.
arXiv Detail & Related papers (2024-01-30T07:19:36Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Bidirectional Representation Learning from Transformers using Multimodal
Electronic Health Record Data to Predict Depression [11.1492931066686]
We present a temporal deep learning model to perform bidirectional representation learning on EHR sequences to predict depression.
The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model.
arXiv Detail & Related papers (2020-09-26T17:56:37Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.