Mitigating Health Data Poverty: Generative Approaches versus Resampling
  for Time-series Clinical Data
        - URL: http://arxiv.org/abs/2210.13958v2
- Date: Wed, 26 Oct 2022 07:38:36 GMT
- Title: Mitigating Health Data Poverty: Generative Approaches versus Resampling
  for Time-series Clinical Data
- Authors: Raffaele Marchesi, Nicolo Micheletti, Giuseppe Jurman, Venet Osmani
- Abstract summary: Augmenting the minority class using resampling (such as SMOTE) is a widely used approach due to the simplicity of the algorithms.
We show that our approach is better at both generating authentic data of the minority class and remaining within the original distribution of the real data.
- Score: 0.2867517731896504
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract:   Several approaches have been developed to mitigate algorithmic bias stemming
from health data poverty, where minority groups are underrepresented in
training datasets. Augmenting the minority class using resampling (such as
SMOTE) is a widely used approach due to the simplicity of the algorithms.
However, these algorithms decrease data variability and may introduce
correlations between samples, giving rise to the use of generative approaches
based on GAN. Generation of high-dimensional, time-series, authentic data that
provides a wide distribution coverage of the real data, remains a challenging
task for both resampling and GAN-based approaches. In this work we propose
CA-GAN architecture that addresses some of the shortcomings of the current
approaches, where we provide a detailed comparison with both SMOTE and
WGAN-GP*, using a high-dimensional, time-series, real dataset of 3343
hypotensive Caucasian and Black patients. We show that our approach is better
at both generating authentic data of the minority class and remaining within
the original distribution of the real data.
 
      
        Related papers
        - Regression Augmentation With Data-Driven Segmentation [0.0]
 Imbalanced regression arises when the target distribution is skewed, causing models to focus on dense regions and struggle with underrepresented (minority) samples.<n>We propose a fully data-driven GAN-based augmentation framework that uses Mahalanobis-Gaussian Mixture Modeling (GMM) to automatically identify minority samples.
 arXiv  Detail & Related papers  (2025-08-02T18:12:11Z)
- A Novel Double Pruning method for Imbalanced Data using Information   Entropy and Roulette Wheel Selection for Breast Cancer Diagnosis [2.8661021832561757]
 The SMOTEBoost method generates synthetic data to balance the dataset, but it may overlook crucial overlapping regions near the decision boundary.
This paper proposes RE-SMOTEBoost, an enhanced version of SMOTEBoost, designed to overcome these limitations.
It incorporates a filtering mechanism based on information entropy to reduce noise, and borderline cases and improve the quality of generated data.
 arXiv  Detail & Related papers  (2025-03-15T19:34:15Z)
- Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise   Filtering [0.5735035463793009]
 We introduce a framework to enhance the SMOTE algorithm using Variational Autoencoders (VAE)
Our approach systematically quantifies the density of data points in a low-dimensional latent space using the VAE, simultaneously incorporating information on class labels and classification difficulty.
 Empirical studies on several imbalanced datasets represent that this simple process innovatively improves the conventional SMOTE algorithm over the deep learning models.
 arXiv  Detail & Related papers  (2024-05-30T07:06:02Z)
- Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
 Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers.
We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes.
We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
 arXiv  Detail & Related papers  (2023-08-28T18:48:34Z)
- Local Learning Matters: Rethinking Data Heterogeneity in Federated
  Learning [61.488646649045215]
 Federated learning (FL) is a promising strategy for performing privacy-preserving, distributed learning with a network of clients (i.e., edge devices)
 arXiv  Detail & Related papers  (2021-11-28T19:03:39Z)
- Deceive D: Adaptive Pseudo Augmentation for GAN Training with Limited
  Data [125.7135706352493]
 Generative adversarial networks (GANs) typically require ample data for training in order to synthesize high-fidelity images.
Recent studies have shown that training GANs with limited data remains formidable due to discriminator overfitting.
This paper introduces a novel strategy called Adaptive Pseudo Augmentation (APA) to encourage healthy competition between the generator and the discriminator.
 arXiv  Detail & Related papers  (2021-11-12T18:13:45Z)
- Categorical EHR Imputation with Generative Adversarial Nets [11.171712535005357]
 We propose a simple and yet effective approach that is based on previous work on GANs for data imputation.
We show that our imputation approach largely improves the prediction accuracy, compared to more traditional data imputation approaches.
 arXiv  Detail & Related papers  (2021-08-03T18:50:26Z)
- Lung Cancer Risk Estimation with Incomplete Data: A Joint Missing
  Imputation Perspective [5.64530854079352]
 We address imputation of missing data by modeling the joint distribution of multi-modal data.
Motivated by partial bidirectional generative adversarial net (PBiGAN), we propose a new Conditional PBiGAN (C-PBiGAN) method.
C-PBiGAN achieves significant improvements in lung cancer risk estimation compared with representative imputation methods.
 arXiv  Detail & Related papers  (2021-07-25T20:15:16Z)
- Improving Generative Adversarial Networks with Local Coordinate Coding [150.24880482480455]
 Generative adversarial networks (GANs) have shown remarkable success in generating realistic data from some predefined prior distribution.
In practice, semantic information might be represented by some latent distribution learned from data.
We propose an LCCGAN model with local coordinate coding (LCC) to improve the performance of generating data.
 arXiv  Detail & Related papers  (2020-07-28T09:17:50Z)
- Minority Oversampling for Imbalanced Time Series Classification [7.695093197007146]
 This paper proposes a structure preserving Oversampling method to combat the High-dimensional Imbalanced Time-series classification.
 Experimental results on several publicly available time-series datasets demonstrate the superiority of OHIT against the state-of-the-art oversampling algorithms.
 arXiv  Detail & Related papers  (2020-04-14T09:20:12Z)
- Inclusive GAN: Improving Data and Minority Coverage in Generative Models [101.67587566218928]
 We formalize the problem of minority inclusion as one of data coverage.
We then propose to improve data coverage by harmonizing adversarial training with reconstructive generation.
We develop an extension that allows explicit control over the minority subgroups that the model should ensure to include.
 arXiv  Detail & Related papers  (2020-04-07T13:31:33Z)
- Unsupervised Domain Adaptation in Person re-ID via k-Reciprocal
  Clustering and Large-Scale Heterogeneous Environment Synthesis [76.46004354572956]
 We introduce an unsupervised domain adaptation approach for person re-identification.
 Experimental results show that the proposed ktCUDA and SHRED approach achieves an average improvement of +5.7 mAP in re-identification performance.
 arXiv  Detail & Related papers  (2020-01-14T17:43:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.