On the Importance of Diversity in Re-Sampling for Imbalanced Data and
Rare Events in Mortality Risk Models
- URL: http://arxiv.org/abs/2012.09645v1
- Date: Tue, 15 Dec 2020 09:45:35 GMT
- Title: On the Importance of Diversity in Re-Sampling for Imbalanced Data and
Rare Events in Mortality Risk Models
- Authors: Yuxuan (Diana) Yang, Hadi Akbarzadeh Khorshidi, Uwe Aickelin, Aditi
Nevgi, Elif Ekinci
- Abstract summary: The Surgical Outcome Risk Tool (SORT) is one of the tools developed to predict mortality risk throughout the entire period for major elective in-patient surgeries in the UK.
In this study, we enhance the original SORT prediction model (SORT) by addressing the class imbalance within the dataset.
Our proposed method investigates the application of diversity-based selection on top of common re-sampling techniques.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Surgical risk increases significantly when patients present with comorbid
conditions. This has resulted in the creation of numerous risk stratification
tools with the objective of formulating associated surgical risk to assist both
surgeons and patients in decision-making. The Surgical Outcome Risk Tool (SORT)
is one of the tools developed to predict mortality risk throughout the entire
perioperative period for major elective in-patient surgeries in the UK. In this
study, we enhance the original SORT prediction model (UK SORT) by addressing
the class imbalance within the dataset. Our proposed method investigates the
application of diversity-based selection on top of common re-sampling
techniques to enhance the classifier's capability in detecting minority
(mortality) events. Diversity amongst training datasets is an essential factor
in ensuring re-sampled data keeps an accurate depiction of the
minority/majority class region, thereby solving the generalization problem of
mainstream sampling approaches. We incorporate the use of the Solow-Polasky
measure as a drop-in functionality to evaluate diversity, with the addition of
greedy algorithms to identify and discard subsets that share the most
similarity. Additionally, through empirical experiments, we prove that the
performance of the classifier trained over diversity-based dataset outperforms
the original classifier over ten external datasets. Our diversity-based
re-sampling method elevates the performance of the UK SORT algorithm by 1.4$.
Related papers
- MedDiffusion: Boosting Health Risk Prediction via Diffusion-based Data
Augmentation [58.93221876843639]
This paper introduces a novel, end-to-end diffusion-based risk prediction model, named MedDiffusion.
It enhances risk prediction performance by creating synthetic patient data during training to enlarge sample space.
It discerns hidden relationships between patient visits using a step-wise attention mechanism, enabling the model to automatically retain the most vital information for generating high-quality data.
arXiv Detail & Related papers (2023-10-04T01:36:30Z) - Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z) - Ambiguous Medical Image Segmentation using Diffusion Models [60.378180265885945]
We introduce a single diffusion model-based approach that produces multiple plausible outputs by learning a distribution over group insights.
Our proposed model generates a distribution of segmentation masks by leveraging the inherent sampling process of diffusion.
Comprehensive results show that our proposed approach outperforms existing state-of-the-art ambiguous segmentation networks.
arXiv Detail & Related papers (2023-04-10T17:58:22Z) - Self-Certifying Classification by Linearized Deep Assignment [65.0100925582087]
We propose a novel class of deep predictors for classifying metric data on graphs within PAC-Bayes risk certification paradigm.
Building on the recent PAC-Bayes literature and data-dependent priors, this approach enables learning posterior distributions on the hypothesis space.
arXiv Detail & Related papers (2022-01-26T19:59:14Z) - A Deep Variational Approach to Clustering Survival Data [5.871238645229228]
We introduce a novel probabilistic approach to cluster survival data in a variational deep clustering setting.
Our proposed method employs a deep generative model to uncover the underlying distribution of both the explanatory variables and the potentially censored survival times.
arXiv Detail & Related papers (2021-06-10T14:10:25Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Deep Cox Mixtures for Survival Regression [11.64579638651557]
We describe a new approach for survival analysis regression models, based on learning mixtures of Cox regressions to model individual survival distributions.
We perform experiments on multiple real world datasets, and look at the mortality rates of patients across ethnicity and gender.
arXiv Detail & Related papers (2021-01-16T22:41:22Z) - WRSE -- a non-parametric weighted-resolution ensemble for predicting
individual survival distributions in the ICU [0.251657752676152]
Dynamic assessment of mortality risk in the intensive care unit (ICU) can be used to stratify patients, inform about treatment effectiveness or serve as part of an early-warning system.
We show competitive results with state-of-the-art probabilistic models, while greatly reducing training time by factors of 2-9x.
arXiv Detail & Related papers (2020-11-02T10:13:59Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.