Time-dependent Iterative Imputation for Multivariate Longitudinal
Clinical Data
- URL: http://arxiv.org/abs/2304.07821v1
- Date: Sun, 16 Apr 2023 16:10:49 GMT
- Title: Time-dependent Iterative Imputation for Multivariate Longitudinal
Clinical Data
- Authors: Omer Noy and Ron Shamir
- Abstract summary: Time-Dependent Iterative imputation offers a practical solution for imputing time-series data.
When applied to a cohort consisting of more than 500,000 patient observations, our approach outperformed state-of-the-art imputation methods.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Missing data is a major challenge in clinical research. In electronic medical
records, often a large fraction of the values in laboratory tests and vital
signs are missing. The missingness can lead to biased estimates and limit our
ability to draw conclusions from the data. Additionally, many machine learning
algorithms can only be applied to complete datasets. A common solution is data
imputation, the process of filling-in the missing values. However, some of the
popular imputation approaches perform poorly on clinical data. We developed a
simple new approach, Time-Dependent Iterative imputation (TDI), which offers a
practical solution for imputing time-series data. It addresses both
multivariate and longitudinal data, by integrating forward-filling and
Iterative Imputer. The integration employs a patient, variable, and
observation-specific dynamic weighting strategy, based on the clinical patterns
of the data, including missing rates and measurement frequency. We tested TDI
on randomly masked clinical datasets. When applied to a cohort consisting of
more than 500,000 patient observations from MIMIC III, our approach
outperformed state-of-the-art imputation methods for 25 out of 30 clinical
variables, with an overall root-mean-squared-error of 0.63, compared to 0.85
for SoftImpute, the second best method. MIMIC III and COVID-19 inpatient
datasets were used to perform prediction tasks. Importantly, these tests
demonstrated that TDI imputation can lead to improved risk prediction.
Related papers
- An Efficient Contrastive Unimodal Pretraining Method for EHR Time Series Data [35.943089444017666]
We propose an efficient method of contrastive pretraining tailored for long clinical timeseries data.
Our model demonstrates the ability to impute missing measurements, providing clinicians with deeper insights into patient conditions.
arXiv Detail & Related papers (2024-10-11T19:05:25Z) - TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [57.067409211231244]
This paper presents meticulously curated AIready datasets covering multi-modal data (e.g., drug molecule, disease code, text, categorical/numerical features) and 8 crucial prediction challenges in clinical trial design.
We provide basic validation methods for each task to ensure the datasets' usability and reliability.
We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z) - Multimodal Pretraining of Medical Time Series and Notes [45.89025874396911]
Deep learning models show promise in extracting meaningful patterns, but they require extensive labeled data.
We propose a novel approach employing self-supervised pretraining, focusing on the alignment of clinical measurements and notes.
In downstream tasks, including in-hospital mortality prediction and phenotyping, our model outperforms baselines in settings where only a fraction of the data is labeled.
arXiv Detail & Related papers (2023-12-11T21:53:40Z) - Leveraging Unlabelled Data in Multiple-Instance Learning Problems for
Improved Detection of Parkinsonian Tremor in Free-Living Conditions [80.88681952022479]
We introduce a new method for combining semi-supervised with multiple-instance learning.
We show that by leveraging the unlabelled data of 454 subjects we can achieve large performance gains in per-subject tremor detection.
arXiv Detail & Related papers (2023-04-29T12:25:10Z) - Deep Imputation of Missing Values in Time Series Health Data: A Review
with Benchmarking [0.0]
This survey performs six data-centric experiments to benchmark state-of-the-art deep imputation methods on five time series health data sets.
Deep learning methods that jointly perform cross-sectional (across variables) and longitudinal (across time) imputations of missing values in time series data yield statistically better data quality than traditional imputation methods.
arXiv Detail & Related papers (2023-02-10T16:03:36Z) - Unsupervised pre-training of graph transformers on patient population
graphs [48.02011627390706]
We propose a graph-transformer-based network to handle heterogeneous clinical data.
We show the benefit of our pre-training method in a self-supervised and a transfer learning setting.
arXiv Detail & Related papers (2022-07-21T16:59:09Z) - MURAL: An Unsupervised Random Forest-Based Embedding for Electronic
Health Record Data [59.26381272149325]
We present an unsupervised random forest for representing data with disparate variable types.
MURAL forests consist of a set of decision trees where node-splitting variables are chosen at random.
We show that using our approach, we can visualize and classify data more accurately than competing approaches.
arXiv Detail & Related papers (2021-11-19T22:02:21Z) - Sequential Diagnosis Prediction with Transformer and Ontological
Representation [35.88195694025553]
We propose an end-to-end robust transformer-based model called SETOR to handle irregular intervals between a patient's visits with admitted timestamps and length of stay in each visit.
Experiments conducted on two real-world healthcare datasets show that, our sequential diagnoses prediction model SETOR achieves better predictive results than previous state-of-the-art approaches.
arXiv Detail & Related papers (2021-09-07T13:09:55Z) - A random shuffle method to expand a narrow dataset and overcome the
associated challenges in a clinical study: a heart failure cohort example [50.591267188664666]
The aim of this study was to design a random shuffle method to enhance the cardinality of an HF dataset while it is statistically legitimate.
The proposed random shuffle method was able to enhance the HF dataset cardinality circa 10 times and circa 21 times when followed by a random repeated-measures approach.
arXiv Detail & Related papers (2020-12-12T10:59:38Z) - Longitudinal modeling of MS patient trajectories improves predictions of
disability progression [2.117653457384462]
This work addresses the task of optimally extracting information from longitudinal patient data in the real-world setting.
We show that with machine learning methods suited for patient trajectories modeling, we can predict disability progression of patients in a two-year horizon.
Compared to the models available in the literature, this work uses the most complete patient history for MS disease progression prediction.
arXiv Detail & Related papers (2020-11-09T20:48:00Z) - Hemogram Data as a Tool for Decision-making in COVID-19 Management:
Applications to Resource Scarcity Scenarios [62.997667081978825]
COVID-19 pandemics has challenged emergency response systems worldwide, with widespread reports of essential services breakdown and collapse of health care structure.
This work describes a machine learning model derived from hemogram exam data performed in symptomatic patients.
Proposed models can predict COVID-19 qRT-PCR results in symptomatic individuals with high accuracy, sensitivity and specificity.
arXiv Detail & Related papers (2020-05-10T01:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.