A random shuffle method to expand a narrow dataset and overcome the
associated challenges in a clinical study: a heart failure cohort example
- URL: http://arxiv.org/abs/2012.06784v1
- Date: Sat, 12 Dec 2020 10:59:38 GMT
- Title: A random shuffle method to expand a narrow dataset and overcome the
associated challenges in a clinical study: a heart failure cohort example
- Authors: Lorenzo Fassina, Alessandro Faragli, Francesco Paolo Lo Muzio,
Sebastian Kelle, Carlo Campana, Burkert Pieske, Frank Edelmann, Alessio
Alogna
- Abstract summary: The aim of this study was to design a random shuffle method to enhance the cardinality of an HF dataset while it is statistically legitimate.
The proposed random shuffle method was able to enhance the HF dataset cardinality circa 10 times and circa 21 times when followed by a random repeated-measures approach.
- Score: 50.591267188664666
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Heart failure (HF) affects at least 26 million people worldwide, so
predicting adverse events in HF patients represents a major target of clinical
data science. However, achieving large sample sizes sometimes represents a
challenge due to difficulties in patient recruiting and long follow-up times,
increasing the problem of missing data. To overcome the issue of a narrow
dataset cardinality (in a clinical dataset, the cardinality is the number of
patients in that dataset), population-enhancing algorithms are therefore
crucial. The aim of this study was to design a random shuffle method to enhance
the cardinality of an HF dataset while it is statistically legitimate, without
the need of specific hypotheses and regression models. The cardinality
enhancement was validated against an established random repeated-measures
method with regard to the correctness in predicting clinical conditions and
endpoints. In particular, machine learning and regression models were employed
to highlight the benefits of the enhanced datasets. The proposed random shuffle
method was able to enhance the HF dataset cardinality (711 patients before
dataset preprocessing) circa 10 times and circa 21 times when followed by a
random repeated-measures approach. We believe that the random shuffle method
could be used in the cardiovascular field and in other data science problems
when missing data and the narrow dataset cardinality represent an issue.
Related papers
- SeqRisk: Transformer-augmented latent variable model for improved survival prediction with longitudinal data [4.1476925904032464]
We propose SeqRisk, a method that combines variational autoencoder (VAE) or longitudinal VAE (LVAE) with a transformer encoder and Cox proportional hazards module for risk prediction.
We demonstrate that SeqRisk performs competitively compared to existing approaches on both simulated and real-world datasets.
arXiv Detail & Related papers (2024-09-19T12:35:25Z) - MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Time-dependent Iterative Imputation for Multivariate Longitudinal
Clinical Data [0.0]
Time-Dependent Iterative imputation offers a practical solution for imputing time-series data.
When applied to a cohort consisting of more than 500,000 patient observations, our approach outperformed state-of-the-art imputation methods.
arXiv Detail & Related papers (2023-04-16T16:10:49Z) - Statistical and Computational Phase Transitions in Group Testing [73.55361918807883]
We study the group testing problem where the goal is to identify a set of k infected individuals carrying a rare disease.
We consider two different simple random procedures for assigning individuals tests.
arXiv Detail & Related papers (2022-06-15T16:38:50Z) - Practical Challenges in Differentially-Private Federated Survival
Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models.
In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge.
We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - A Hamiltonian Monte Carlo Model for Imputation and Augmentation of
Healthcare Data [0.6719751155411076]
Missing values exist in nearly all clinical studies because data for a variable or question are not collected or not available.
Existing models usually do not consider privacy concerns or do not utilise the inherent correlations across multiple features to impute the missing values.
A Bayesian approach to impute missing values and creating augmented samples in high dimensional healthcare data is proposed in this work.
arXiv Detail & Related papers (2021-03-03T11:57:42Z) - TadGAN: Time Series Anomaly Detection Using Generative Adversarial
Networks [73.01104041298031]
TadGAN is an unsupervised anomaly detection approach built on Generative Adversarial Networks (GANs)
To capture the temporal correlations of time series, we use LSTM Recurrent Neural Networks as base models for Generators and Critics.
To demonstrate the performance and generalizability of our approach, we test several anomaly scoring techniques and report the best-suited one.
arXiv Detail & Related papers (2020-09-16T15:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.