Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting
- URL: http://arxiv.org/abs/2405.20430v1
- Date: Thu, 30 May 2024 19:15:38 GMT
- Title: Enhancing Performance for Highly Imbalanced Medical Data via Data Regularization in a Federated Learning Setting
- Authors: Georgios Tsoumplekas, Ilias Siniosoglou, Vasileios Argyriou, Ioannis D. Moscholios, Panagiotis Sarigiannidis,
- Abstract summary: The goal of the proposed method is to enhance model performance for cardiovascular disease prediction.
The method is evaluated across four datasets for cardiovascular disease prediction, which are scattered across different clients.
- Score: 6.22153888560487
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increased availability of medical data has significantly impacted healthcare by enabling the application of machine / deep learning approaches in various instances. However, medical datasets are usually small and scattered across multiple providers, suffer from high class-imbalance, and are subject to stringent data privacy constraints. In this paper, the application of a data regularization algorithm, suitable for learning under high class-imbalance, in a federated learning setting is proposed. Specifically, the goal of the proposed method is to enhance model performance for cardiovascular disease prediction by tackling the class-imbalance that typically characterizes datasets used for this purpose, as well as by leveraging patient data available in different nodes of a federated ecosystem without compromising their privacy and enabling more resource sensitive allocation. The method is evaluated across four datasets for cardiovascular disease prediction, which are scattered across different clients, achieving improved performance. Meanwhile, its robustness under various hyperparameter settings, as well as its ability to adapt to different resource allocation scenarios, is verified.
Related papers
- Multi-Source Conformal Inference Under Distribution Shift [41.701790856201036]
We consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources.
We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations.
We propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction.
arXiv Detail & Related papers (2024-05-15T13:33:09Z) - Learnable Weight Initialization for Volumetric Medical Image Segmentation [66.3030435676252]
We propose a learnable weight-based hybrid medical image segmentation approach.
Our approach is easy to integrate into any hybrid model and requires no external training data.
Experiments on multi-organ and lung cancer segmentation tasks demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2023-06-15T17:55:05Z) - Exploratory Analysis of Federated Learning Methods with Differential
Privacy on MIMIC-III [0.7349727826230862]
Federated learning methods offer the possibility of training machine learning models on privacy-sensitive data sets.
We present an evaluation of the impact of different federation and differential privacy techniques when training models on the open-source MIMIC-III dataset.
arXiv Detail & Related papers (2023-02-08T17:27:44Z) - Density-Aware Personalized Training for Risk Prediction in Imbalanced
Medical Data [89.79617468457393]
Training models with imbalance rate (class density discrepancy) may lead to suboptimal prediction.
We propose a framework for training models for this imbalance issue.
We demonstrate our model's improved performance in real-world medical datasets.
arXiv Detail & Related papers (2022-07-23T00:39:53Z) - Practical Challenges in Differentially-Private Federated Survival
Analysis of Medical Data [57.19441629270029]
In this paper, we take advantage of the inherent properties of neural networks to federate the process of training of survival analysis models.
In the realistic setting of small medical datasets and only a few data centers, this noise makes it harder for the models to converge.
We propose DPFed-post which adds a post-processing stage to the private federated learning scheme.
arXiv Detail & Related papers (2022-02-08T10:03:24Z) - FedSLD: Federated Learning with Shared Label Distribution for Medical
Image Classification [6.0088002781256185]
We propose Federated Learning with Shared Label Distribution (FedSLD) for classification tasks.
FedSLD adjusts the contribution of each data sample to the local objective during optimization given knowledge of the distribution.
Our results show that FedSLD achieves better convergence performance than the compared leading FL optimization algorithms.
arXiv Detail & Related papers (2021-10-15T21:38:25Z) - Personalized Federated Learning with Clustering: Non-IID Heart Rate
Variability Data Application [0.1465840097113565]
We propose Personalized Federated Cluster Models, a hierarchical clustering-based FL process, to predict Major Depressive Disorder severity from Heart Rate Variability.
By allowing clients to receive more personalized model, we address problems caused by non-IID data, showing an accuracy increase in severity prediction.
arXiv Detail & Related papers (2021-08-04T08:24:23Z) - An Experimental Study of Data Heterogeneity in Federated Learning
Methods for Medical Imaging [8.984706828657814]
Federated learning enables multiple institutions to collaboratively train machine learning models on their local data in a privacy-preserving way.
We investigate the deleterious impact of a taxonomy of data heterogeneity regimes on federated learning methods, including quantity skew, label distribution skew, and imaging acquisition skew.
We present several mitigation strategies to overcome performance drops from data heterogeneity, including weighted average for data quantity skew, weighted loss and batch normalization averaging for label distribution skew.
arXiv Detail & Related papers (2021-07-18T05:47:48Z) - Bootstrapping Your Own Positive Sample: Contrastive Learning With
Electronic Health Record Data [62.29031007761901]
This paper proposes a novel contrastive regularized clinical classification model.
We introduce two unique positive sampling strategies specifically tailored for EHR data.
Our framework yields highly competitive experimental results in predicting the mortality risk on real-world COVID-19 EHR data.
arXiv Detail & Related papers (2021-04-07T06:02:04Z) - Predictive Modeling of ICU Healthcare-Associated Infections from
Imbalanced Data. Using Ensembles and a Clustering-Based Undersampling
Approach [55.41644538483948]
This work is focused on both the identification of risk factors and the prediction of healthcare-associated infections in intensive-care units.
The aim is to support decision making addressed at reducing the incidence rate of infections.
arXiv Detail & Related papers (2020-05-07T16:13:12Z) - Self-Training with Improved Regularization for Sample-Efficient Chest
X-Ray Classification [80.00316465793702]
We present a deep learning framework that enables robust modeling in challenging scenarios.
Our results show that using 85% lesser labeled data, we can build predictive models that match the performance of classifiers trained in a large-scale data setting.
arXiv Detail & Related papers (2020-05-03T02:36:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.