An Extensive Data Processing Pipeline for MIMIC-IV
- URL: http://arxiv.org/abs/2204.13841v1
- Date: Fri, 29 Apr 2022 01:09:38 GMT
- Title: An Extensive Data Processing Pipeline for MIMIC-IV
- Authors: Mehak Gupta, Brennan Gallamoza, Nicolas Cutrona, Pranjal Dhakal,
Raphael Poulain, Rahmatollah Beheshti
- Abstract summary: We provide an end-to-end fully customizable pipeline to extract, clean, and pre-process data.
We predict and evaluate the fourth version of the MIMIC dataset (MIMIC-IV) for ICU and non-ICU-related clinical time-series prediction tasks.
- Score: 0.20326203100766121
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: An increasing amount of research is being devoted to applying machine
learning methods to electronic health record (EHR) data for various clinical
tasks. This growing area of research has exposed the limitation of
accessibility of EHR datasets for all, as well as the reproducibility of
different modeling frameworks. One reason for these limitations is the lack of
standardized pre-processing pipelines. MIMIC is a freely available EHR dataset
in a raw format that has been used in numerous studies. The absence of
standardized pre-processing steps serves as a major barrier to the wider
adoption of the dataset. It also leads to different cohorts being used in
downstream tasks, limiting the ability to compare the results among similar
studies. Contrasting studies also use various distinct performance metrics,
which can greatly reduce the ability to compare model results. In this work, we
provide an end-to-end fully customizable pipeline to extract, clean, and
pre-process data; and to predict and evaluate the fourth version of the MIMIC
dataset (MIMIC-IV) for ICU and non-ICU-related clinical time-series prediction
tasks.
Related papers
- MEDS-Tab: Automated tabularization and baseline methods for MEDS datasets [2.8209943093430443]
This work is powered by complementary advances in core data standardization through the MEDS framework.
We dramatically simplify and accelerate this process of scalably featurizing irregularly sampled time-series data.
This system will greatly enhance the reliability, scalable, and ease of development of powerful ML solutions for health problems across diverse datasets and clinical settings.
arXiv Detail & Related papers (2024-10-31T20:36:37Z) - Meta-Learners for Partially-Identified Treatment Effects Across Multiple Environments [67.80453452949303]
Estimating the conditional average treatment effect (CATE) from observational data is relevant for many applications such as personalized medicine.
Here, we focus on the widespread setting where the observational data come from multiple environments.
We propose different model-agnostic learners (so-called meta-learners) to estimate the bounds that can be used in combination with arbitrary machine learning models.
arXiv Detail & Related papers (2024-06-04T16:31:43Z) - EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling [22.94521527609479]
EMERGE is a Retrieval-Augmented Generation driven framework aimed at enhancing multimodal EHR predictive modeling.
Our approach extracts entities from both time-series data and clinical notes by prompting Large Language Models.
The extracted knowledge is then used to generate task-relevant summaries of patients' health statuses.
arXiv Detail & Related papers (2024-05-27T10:53:15Z) - Towards Precision Healthcare: Robust Fusion of Time Series and Image Data [8.579651833717763]
We introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information.
We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results.
Our experiments show that our method is effective in improving multimodal deep learning for clinical applications.
arXiv Detail & Related papers (2024-05-24T11:18:13Z) - Convolutional Monge Mapping Normalization for learning on sleep data [63.22081662149488]
We propose a new method called Convolutional Monge Mapping Normalization (CMMN)
CMMN consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data.
Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture.
arXiv Detail & Related papers (2023-05-30T08:24:01Z) - Automated Medical Coding on MIMIC-III and MIMIC-IV: A Critical Review
and Replicability Study [60.56194508762205]
We reproduce, compare, and analyze state-of-the-art automated medical coding machine learning models.
We show that several models underperform due to weak configurations, poorly sampled train-test splits, and insufficient evaluation.
We present the first comprehensive results on the newly released MIMIC-IV dataset using the reproduced models.
arXiv Detail & Related papers (2023-04-21T11:54:44Z) - Time Associated Meta Learning for Clinical Prediction [78.99422473394029]
We propose a novel time associated meta learning (TAML) method to make effective predictions at multiple future time points.
To address the sparsity problem after task splitting, TAML employs a temporal information sharing strategy to augment the number of positive samples.
We demonstrate the effectiveness of TAML on multiple clinical datasets, where it consistently outperforms a range of strong baselines.
arXiv Detail & Related papers (2023-03-05T03:54:54Z) - Unsupervised Pre-Training on Patient Population Graphs for Patient-Level
Predictions [48.02011627390706]
Pre-training has shown success in different areas of machine learning, such as Computer Vision (CV), Natural Language Processing (NLP) and medical imaging.
In this paper, we apply unsupervised pre-training to heterogeneous, multi-modal EHR data for patient outcome prediction.
We find that our proposed graph based pre-training method helps in modeling the data at a population level.
arXiv Detail & Related papers (2022-03-23T17:59:45Z) - HiRID-ICU-Benchmark -- A Comprehensive Machine Learning Benchmark on
High-resolution ICU Data [0.8418021941792283]
We aim to provide a benchmark covering a large spectrum of ICU-related tasks.
Using the HiRID dataset, we define multiple clinically relevant tasks developed in collaboration with clinicians.
We provide an in-depth analysis of current state-of-the-art sequence modeling methods, highlighting some limitations of deep learning approaches for this type of data.
arXiv Detail & Related papers (2021-11-16T15:06:42Z) - Deep neural networks approach to microbial colony detection -- a
comparative analysis [52.77024349608834]
This study investigates the performance of three deep learning approaches for object detection on the AGAR dataset.
The achieved results may serve as a benchmark for future experiments.
arXiv Detail & Related papers (2021-08-23T12:06:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.