Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach
- URL: http://arxiv.org/abs/2005.03582v1
- Date: Thu, 7 May 2020 16:13:12 GMT
- Title: Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach
- Authors: Fernando S\'anchez-Hern\'andez, Juan Carlos Ballesteros-Herr\'aez,
Mohamed S. Kraiem, Mercedes S\'anchez-Barba and Mar\'ia N. Moreno-Garc\'ia
- Abstract summary: This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
- Score: 55.41644538483948
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Early detection of patients vulnerable to infections acquired in the hospital
environment is a challenge in current health systems given the impact that such
infections have on patient mortality and healthcare costs. This work is focused
on both the identification of risk factors and the prediction of
healthcare-associated infections in intensive-care units by means of
machine-learning methods. The aim is to support decision making addressed at
reducing the incidence rate of infections. In this field, it is necessary to
deal with the problem of building reliable classifiers from imbalanced
datasets. We propose a clustering-based undersampling strategy to be used in
combination with ensemble classifiers. A comparative study with data from 4616
patients was conducted in order to validate our proposal. We applied several
single and ensemble classifiers both to the original dataset and to data
preprocessed by means of different resampling methods. The results were
analyzed by means of classic and recent metrics specifically designed for
imbalanced data classification. They revealed that the proposal is more
efficient in comparison with other approaches.
Related papers
- Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting [6.22153888560487]
The goal of the proposed method is to enhance model performance for cardiovascular disease prediction.
The method is evaluated across four datasets for cardiovascular disease prediction, which are scattered across different clients.
arXiv Detail & Related papers (2024-05-30T19:15:38Z) - Few-shot learning for COVID-19 Chest X-Ray Classification with
Imbalanced Data: An Inter vs. Intra Domain Study [49.5374512525016]
Medical image datasets are essential for training models used in computer-aided diagnosis, treatment planning, and medical research.
Some challenges are associated with these datasets, including variability in data distribution, data scarcity, and transfer learning issues when using models pre-trained from generic images.
We propose a methodology based on Siamese neural networks in which a series of techniques are integrated to mitigate the effects of data scarcity and distribution imbalance.
arXiv Detail & Related papers (2024-01-18T16:59:27Z) - Counterfactual Data Augmentation with Contrastive Learning [27.28511396131235]
We introduce a model-agnostic data augmentation method that imputes the counterfactual outcomes for a selected subset of individuals.
We use contrastive learning to learn a representation space and a similarity measure such that in the learned representation space close individuals identified by the learned similarity measure have similar potential outcomes.
This property ensures reliable imputation of counterfactual outcomes for the individuals with close neighbors from the alternative treatment group.
arXiv Detail & Related papers (2023-11-07T00:36:51Z) - An AI-Guided Data Centric Strategy to Detect and Mitigate Biases in
Healthcare Datasets [32.25265709333831]
We generate a data-centric, model-agnostic, task-agnostic approach to evaluate dataset bias by investigating the relationship between how easily different groups are learned at small sample sizes (AEquity)
We then apply a systematic analysis of AEq values across subpopulations to identify and manifestations of racial bias in two known cases in healthcare.
AEq is a novel and broadly applicable metric that can be applied to advance equity by diagnosing and remediating bias in healthcare datasets.
arXiv Detail & Related papers (2023-11-06T17:08:41Z) - Density-Aware Personalized Training for Risk Prediction in Imbalanced
Medical Data [89.79617468457393]
Training models with imbalance rate (class density discrepancy) may lead to suboptimal prediction.
We propose a framework for training models for this imbalance issue.
We demonstrate our model's improved performance in real-world medical datasets.
arXiv Detail & Related papers (2022-07-23T00:39:53Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Adversarial Sample Enhanced Domain Adaptation: A Case Study on
Predictive Modeling with Electronic Health Records [57.75125067744978]
We propose a data augmentation method to facilitate domain adaptation.
adversarially generated samples are used during domain adaptation.
Results confirm the effectiveness of our method and the generality on different tasks.
arXiv Detail & Related papers (2021-01-13T03:20:20Z) - Mixture Model Framework for Traumatic Brain Injury Prognosis Using
Heterogeneous Clinical and Outcome Data [3.7363119896212478]
We develop a method for modeling large heterogeneous data types relevant to TBI.
The model is trained on a dataset encompassing a variety of data types, including demographics, blood-based biomarkers, and imaging findings.
It is used to stratify patients into distinct groups in an unsupervised learning setting.
arXiv Detail & Related papers (2020-12-22T19:31:03Z) - On the Importance of Diversity in Re-Sampling for Imbalanced Data and
Rare Events in Mortality Risk Models [0.0]
The Surgical Outcome Risk Tool (SORT) is one of the tools developed to predict mortality risk throughout the entire period for major elective in-patient surgeries in the UK.
In this study, we enhance the original SORT prediction model (SORT) by addressing the class imbalance within the dataset.
Our proposed method investigates the application of diversity-based selection on top of common re-sampling techniques.
arXiv Detail & Related papers (2020-12-15T09:45:35Z) - Semi-supervised Medical Image Classification with Relation-driven
Self-ensembling Model [71.80319052891817]
We present a relation-driven semi-supervised framework for medical image classification.
It exploits the unlabeled data by encouraging the prediction consistency of given input under perturbations.
Our method outperforms many state-of-the-art semi-supervised learning methods on both single-label and multi-label image classification scenarios.
arXiv Detail & Related papers (2020-05-15T06:57:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.