Deep Stable Representation Learning on Electronic Health Records
- URL: http://arxiv.org/abs/2209.01321v1
- Date: Sat, 3 Sep 2022 04:10:45 GMT
- Title: Deep Stable Representation Learning on Electronic Health Records
- Authors: Yingtao Luo, Zhaocheng Liu and Qiang Liu
- Abstract summary: Causal Healthcare Embedding (CHE) aims at eliminating the spurious statistical relationship by removing the dependencies between diagnoses and procedures.
Our proposed CHE method can be used as a flexible plug-and-play module that can enhance existing deep learning models on EHR.
- Score: 8.256340233221112
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep learning models have achieved promising disease prediction performance
of the Electronic Health Records (EHR) of patients. However, most models
developed under the I.I.D. hypothesis fail to consider the agnostic
distribution shifts, diminishing the generalization ability of deep learning
models to Out-Of-Distribution (OOD) data. In this setting, spurious statistical
correlations that may change in different environments will be exploited, which
can cause sub-optimal performances of deep learning models. The unstable
correlation between procedures and diagnoses existed in the training
distribution can cause spurious correlation between historical EHR and future
diagnosis. To address this problem, we propose to use a causal representation
learning method called Causal Healthcare Embedding (CHE). CHE aims at
eliminating the spurious statistical relationship by removing the dependencies
between diagnoses and procedures. We introduce the Hilbert-Schmidt Independence
Criterion (HSIC) to measure the degree of independence between the embedded
diagnosis and procedure features. Based on causal view analyses, we perform the
sample weighting technique to get rid of such spurious relationship for the
stable learning of EHR across different environments. Moreover, our proposed
CHE method can be used as a flexible plug-and-play module that can enhance
existing deep learning models on EHR. Extensive experiments on two public
datasets and five state-of-the-art baselines unequivocally show that CHE can
improve the prediction accuracy of deep learning models on out-of-distribution
data by a large margin. In addition, the interpretability study shows that CHE
could successfully leverage causal structures to reflect a more reasonable
contribution of historical records for predictions.
Related papers
- Establishing Truly Causal Relationship Between Whole Slide Image Predictions and Diagnostic Evidence Subregions in Deep Learning [3.783430751544095]
Multiple Instance Learning (MIL) has gained significant attention due to its ability to be trained using only slide-level diagnostic labels.
Previous MIL researches have primarily focused on enhancing feature aggregators for globally analyzing WSIs, but overlook a causal relationship in diagnosis.
We propose Causal Inference Multiple Instance Learning (CI-MIL) to establish the truly causal relationship between model predictions and diagnostic evidence regions.
arXiv Detail & Related papers (2024-07-24T11:00:08Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Unified Uncertainty Estimation for Cognitive Diagnosis Models [70.46998436898205]
We propose a unified uncertainty estimation approach for a wide range of cognitive diagnosis models.
We decompose the uncertainty of diagnostic parameters into data aspect and model aspect.
Our method is effective and can provide useful insights into the uncertainty of cognitive diagnosis.
arXiv Detail & Related papers (2024-03-09T13:48:20Z) - Learnable Prompt as Pseudo-Imputation: Reassessing the Necessity of
Traditional EHR Data Imputation in Downstream Clinical Prediction [16.638760651750744]
Existing deep learning training protocols require the use of statistical information or imputation models to reconstruct missing values.
This paper introduces Learnable Prompt as Pseudo Imputation (PAI) as a new training protocol.
PAI no longer introduces any imputed data but constructs a learnable prompt to model the implicit preferences of the downstream model for missing values.
arXiv Detail & Related papers (2024-01-30T07:19:36Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - MMLN: Leveraging Domain Knowledge for Multimodal Diagnosis [10.133715767542386]
We propose a knowledge-driven and data-driven framework for lung disease diagnosis.
We formulate diagnosis rules according to authoritative clinical medicine guidelines and learn the weights of rules from text data.
A multimodal fusion consisting of text and image data is designed to infer the marginal probability of lung disease.
arXiv Detail & Related papers (2022-02-09T04:12:30Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - UNITE: Uncertainty-based Health Risk Prediction Leveraging Multi-sourced
Data [81.00385374948125]
We present UNcertaInTy-based hEalth risk prediction (UNITE) model.
UNITE provides accurate disease risk prediction and uncertainty estimation leveraging multi-sourced health data.
We evaluate UNITE on real-world disease risk prediction tasks: nonalcoholic fatty liver disease (NASH) and Alzheimer's disease (AD)
UNITE achieves up to 0.841 in F1 score for AD detection, up to 0.609 in PR-AUC for NASH detection, and outperforms various state-of-the-art baselines by up to $19%$ over the best baseline.
arXiv Detail & Related papers (2020-10-22T02:28:11Z) - Bidirectional Representation Learning from Transformers using Multimodal
Electronic Health Record Data to Predict Depression [11.1492931066686]
We present a temporal deep learning model to perform bidirectional representation learning on EHR sequences to predict depression.
The model generated the highest increases of precision-recall area under the curve (PRAUC) from 0.70 to 0.76 in depression prediction compared to the best baseline model.
arXiv Detail & Related papers (2020-09-26T17:56:37Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.