Handling Non-ignorably Missing Features in Electronic Health Records
Data Using Importance-Weighted Autoencoders
- URL: http://arxiv.org/abs/2101.07357v2
- Date: Fri, 5 Feb 2021 20:05:41 GMT
- Title: Handling Non-ignorably Missing Features in Electronic Health Records
Data Using Importance-Weighted Autoencoders
- Authors: David K. Lim, Naim U. Rashid, Junier B. Oliva, Joseph G. Ibrahim
- Abstract summary: We propose a novel extension of VAEs called Importance-Weighted Autoencoders (IWAEs) to flexibly handle Missing Not At Random patterns in the Physionet data.
Our proposed method models the missingness mechanism using an embedded neural network, eliminating the need to specify the exact form of the missingness mechanism a priori.
- Score: 8.518166245293703
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Electronic Health Records (EHRs) are commonly used to investigate
relationships between patient health information and outcomes. Deep learning
methods are emerging as powerful tools to learn such relationships, given the
characteristic high dimension and large sample size of EHR datasets. The
Physionet 2012 Challenge involves an EHR dataset pertaining to 12,000 ICU
patients, where researchers investigated the relationships between clinical
measurements, and in-hospital mortality. However, the prevalence and complexity
of missing data in the Physionet data present significant challenges for the
application of deep learning methods, such as Variational Autoencoders (VAEs).
Although a rich literature exists regarding the treatment of missing data in
traditional statistical models, it is unclear how this extends to deep learning
architectures. To address these issues, we propose a novel extension of VAEs
called Importance-Weighted Autoencoders (IWAEs) to flexibly handle Missing Not
At Random (MNAR) patterns in the Physionet data. Our proposed method models the
missingness mechanism using an embedded neural network, eliminating the need to
specify the exact form of the missingness mechanism a priori. We show that the
use of our method leads to more realistic imputed values relative to the
state-of-the-art, as well as significant differences in fitted downstream
models for mortality.
Related papers
- Fine-tuning -- a Transfer Learning approach [0.22344294014777952]
Missingness in Electronic Health Records (EHRs) is often hampered by the abundance of missing data in this valuable resource.
Existing deep imputation methods rely on end-to-end pipelines that incorporate both imputation and downstream analyses.
This paper explores the development of a modular, deep learning-based imputation and classification pipeline.
arXiv Detail & Related papers (2024-11-06T14:18:23Z) - InVAErt networks for amortized inference and identifiability analysis of lumped parameter hemodynamic models [0.0]
In this study, we use inVAErt networks, a neural network-based, data-driven framework for enhanced digital twin analysis of stiff dynamical systems.
We demonstrate the flexibility and effectiveness of inVAErt networks in the context of physiological inversion of a six-compartment lumped parameter hemodynamic model from synthetic data to real data with missing components.
arXiv Detail & Related papers (2024-08-15T17:07:40Z) - Synthesizing Multimodal Electronic Health Records via Predictive Diffusion Models [69.06149482021071]
We propose a novel EHR data generation model called EHRPD.
It is a diffusion-based model designed to predict the next visit based on the current one while also incorporating time interval estimation.
We conduct experiments on two public datasets and evaluate EHRPD from fidelity, privacy, and utility perspectives.
arXiv Detail & Related papers (2024-06-20T02:20:23Z) - Learnable Prompt as Pseudo-Imputation: Reassessing the Necessity of
Traditional EHR Data Imputation in Downstream Clinical Prediction [16.638760651750744]
Existing deep learning training protocols require the use of statistical information or imputation models to reconstruct missing values.
This paper introduces Learnable Prompt as Pseudo Imputation (PAI) as a new training protocol.
PAI no longer introduces any imputed data but constructs a learnable prompt to model the implicit preferences of the downstream model for missing values.
arXiv Detail & Related papers (2024-01-30T07:19:36Z) - IGNITE: Individualized GeNeration of Imputations in Time-series
Electronic health records [7.451873794596469]
We propose a novel deep-learning model that learns the underlying patient dynamics to generate personalized values conditioning on an individual's demographic characteristics and treatments.
Our proposed model, IGNITE, utilise a conditional dual-variational autoencoder augmented with dual-stage attention to generate missing values for an individual.
We show that IGNITE outperforms state-of-the-art approaches in missing data reconstruction and task prediction.
arXiv Detail & Related papers (2024-01-09T07:57:21Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Medical data wrangling with sequential variational autoencoders [5.9207487081080705]
This paper proposes to model medical data records with heterogeneous data types and bursty missing data using sequential variational autoencoders (VAEs)
We show that Shi-VAE achieves the best performance in terms of using both metrics, with lower computational complexity than the GP-VAE model.
arXiv Detail & Related papers (2021-03-12T10:59:26Z) - Select-ProtoNet: Learning to Select for Few-Shot Disease Subtype
Prediction [55.94378672172967]
We focus on few-shot disease subtype prediction problem, identifying subgroups of similar patients.
We introduce meta learning techniques to develop a new model, which can extract the common experience or knowledge from interrelated clinical tasks.
Our new model is built upon a carefully designed meta-learner, called Prototypical Network, that is a simple yet effective meta learning machine for few-shot image classification.
arXiv Detail & Related papers (2020-09-02T02:50:30Z) - A Systematic Approach to Featurization for Cancer Drug Sensitivity
Predictions with Deep Learning [49.86828302591469]
We train >35,000 neural network models, sweeping over common featurization techniques.
We found the RNA-seq to be highly redundant and informative even with subsets larger than 128 features.
arXiv Detail & Related papers (2020-04-30T20:42:17Z) - Learning Dynamic and Personalized Comorbidity Networks from Event Data
using Deep Diffusion Processes [102.02672176520382]
Comorbid diseases co-occur and progress via complex temporal patterns that vary among individuals.
In electronic health records we can observe the different diseases a patient has, but can only infer the temporal relationship between each co-morbid condition.
We develop deep diffusion processes to model "dynamic comorbidity networks"
arXiv Detail & Related papers (2020-01-08T15:47:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.