A Hamiltonian Monte Carlo Model for Imputation and Augmentation of
Healthcare Data
- URL: http://arxiv.org/abs/2103.02349v1
- Date: Wed, 3 Mar 2021 11:57:42 GMT
- Title: A Hamiltonian Monte Carlo Model for Imputation and Augmentation of
Healthcare Data
- Authors: Narges Pourshahrokhi, Samaneh Kouchaki, Kord M. Kober, Christine
Miaskowski, Payam Barnaghi
- Abstract summary: Missing values exist in nearly all clinical studies because data for a variable or question are not collected or not available.
Existing models usually do not consider privacy concerns or do not utilise the inherent correlations across multiple features to impute the missing values.
A Bayesian approach to impute missing values and creating augmented samples in high dimensional healthcare data is proposed in this work.
- Score: 0.6719751155411076
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Missing values exist in nearly all clinical studies because data for a
variable or question are not collected or not available. Inadequate handling of
missing values can lead to biased results and loss of statistical power in
analysis. Existing models usually do not consider privacy concerns or do not
utilise the inherent correlations across multiple features to impute the
missing values. In healthcare applications, we are usually confronted with high
dimensional and sometimes small sample size datasets that need more effective
augmentation or imputation techniques. Besides, imputation and augmentation
processes are traditionally conducted individually. However, imputing missing
values and augmenting data can significantly improve generalisation and avoid
bias in machine learning models. A Bayesian approach to impute missing values
and creating augmented samples in high dimensional healthcare data is proposed
in this work. We propose folded Hamiltonian Monte Carlo (F-HMC) with Bayesian
inference as a more practical approach to process the cross-dimensional
relations by applying a random walk and Hamiltonian dynamics to adapt posterior
distribution and generate large-scale samples. The proposed method is applied
to a cancer symptom assessment dataset and confirmed to enrich the quality of
data in precision, accuracy, recall, F1 score, and propensity metric.
Related papers
- Handling Overlapping Asymmetric Datasets -- A Twice Penalized P-Spline
Approach [0.40964539027092917]
This research aims to develop a new method which can model the smaller cohort against a particular response.
We find our twice penalized approach offers an enhanced fit over a linear B-Spline and once penalized P-Spline approximation.
Applying to a real-life dataset relating to a person's risk of developing Non-Alcoholic Steatohepatitis, we see an improved model fit performance of over 65%.
arXiv Detail & Related papers (2023-11-17T12:41:07Z) - Towards frugal unsupervised detection of subtle abnormalities in medical
imaging [0.0]
Anomaly detection in medical imaging is a challenging task in contexts where abnormalities are not annotated.
We investigate mixtures of probability distributions whose versatility has been widely recognized.
This online approach is illustrated on the challenging detection of subtle abnormalities in MR brain scans for the follow-up of newly diagnosed Parkinsonian patients.
arXiv Detail & Related papers (2023-09-04T07:44:54Z) - Robust self-healing prediction model for high dimensional data [0.685316573653194]
This work proposes a robust self healing (RSH) hybrid prediction model.
It functions by using the data in its entirety by removing errors and inconsistencies from it rather than discarding any data.
The proposed method is compared with some of the existing high performing models and the results are analyzed.
arXiv Detail & Related papers (2022-10-04T17:55:50Z) - MissDAG: Causal Discovery in the Presence of Missing Data with
Continuous Additive Noise Models [78.72682320019737]
We develop a general method, which we call MissDAG, to perform causal discovery from data with incomplete observations.
MissDAG maximizes the expected likelihood of the visible part of observations under the expectation-maximization framework.
We demonstrate the flexibility of MissDAG for incorporating various causal discovery algorithms and its efficacy through extensive simulations and real data experiments.
arXiv Detail & Related papers (2022-05-27T09:59:46Z) - X-model: Improving Data Efficiency in Deep Learning with A Minimax Model [78.55482897452417]
We aim at improving data efficiency for both classification and regression setups in deep learning.
To take the power of both worlds, we propose a novel X-model.
X-model plays a minimax game between the feature extractor and task-specific heads.
arXiv Detail & Related papers (2021-10-09T13:56:48Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Medical data wrangling with sequential variational autoencoders [5.9207487081080705]
This paper proposes to model medical data records with heterogeneous data types and bursty missing data using sequential variational autoencoders (VAEs)
We show that Shi-VAE achieves the best performance in terms of using both metrics, with lower computational complexity than the GP-VAE model.
arXiv Detail & Related papers (2021-03-12T10:59:26Z) - Efficient Causal Inference from Combined Observational and
Interventional Data through Causal Reductions [68.6505592770171]
Unobserved confounding is one of the main challenges when estimating causal effects.
We propose a novel causal reduction method that replaces an arbitrary number of possibly high-dimensional latent confounders.
We propose a learning algorithm to estimate the parameterized reduced model jointly from observational and interventional data.
arXiv Detail & Related papers (2021-03-08T14:29:07Z) - A random shuffle method to expand a narrow dataset and overcome the
associated challenges in a clinical study: a heart failure cohort example [50.591267188664666]
The aim of this study was to design a random shuffle method to enhance the cardinality of an HF dataset while it is statistically legitimate.
The proposed random shuffle method was able to enhance the HF dataset cardinality circa 10 times and circa 21 times when followed by a random repeated-measures approach.
arXiv Detail & Related papers (2020-12-12T10:59:38Z) - VAEs in the Presence of Missing Data [6.397263087026567]
We develop a novel latent variable model of a corruption process which generates missing data, and derive a corresponding tractable evidence lower bound (ELBO)
Our model is straightforward to implement, can handle both missing completely at random (MCAR) and missing not at random (MNAR) data, scales to high dimensional inputs and gives both the VAE encoder and decoder access to indicator variables for whether a data element is missing or not.
On the MNIST and SVHN datasets we demonstrate improved marginal log-likelihood of observed data and better missing data imputation, compared to existing approaches.
arXiv Detail & Related papers (2020-06-09T14:40:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.