Related papers: An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging

An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging

URL: http://arxiv.org/abs/2107.08371v1
Date: Sun, 18 Jul 2021 05:47:48 GMT
Title: An Experimental Study of Data Heterogeneity in Federated Learning Methods for Medical Imaging
Authors: Liangqiong Qu, Niranjan Balachandar and Daniel L Rubin
Abstract summary: Federated learning enables multiple institutions to collaboratively train machine learning models on their local data in a privacy-preserving way. We investigate the deleterious impact of a taxonomy of data heterogeneity regimes on federated learning methods, including quantity skew, label distribution skew, and imaging acquisition skew. We present several mitigation strategies to overcome performance drops from data heterogeneity, including weighted average for data quantity skew, weighted loss and batch normalization averaging for label distribution skew.
Score: 8.984706828657814
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Federated learning enables multiple institutions to collaboratively train machine learning models on their local data in a privacy-preserving way. However, its distributed nature often leads to significant heterogeneity in data distributions across institutions. In this paper, we investigate the deleterious impact of a taxonomy of data heterogeneity regimes on federated learning methods, including quantity skew, label distribution skew, and imaging acquisition skew. We show that the performance degrades with the increasing degrees of data heterogeneity. We present several mitigation strategies to overcome performance drops from data heterogeneity, including weighted average for data quantity skew, weighted loss and batch normalization averaging for label distribution skew. The proposed optimizations to federated learning methods improve their capability of handling heterogeneity across institutions, which provides valuable guidance for the deployment of federated learning in real clinical applications.

Related papers

Robust Molecular Property Prediction via Densifying Scarce Labeled Data [51.55434084913129]
In drug discovery, compounds most critical for advancing research often lie beyond the training set.<n>We propose a novel meta-learning-based approach that leverages unlabeled data to interpolate between in-distribution (ID) and out-of-distribution (OOD) data.<n>We demonstrate significant performance gains on challenging real-world datasets.
arXiv Detail & Related papers (2025-06-13T15:27:40Z)
Federated Impression for Learning with Distributed Heterogeneous Data [19.50235109938016]
Federated learning (FL) provides a paradigm that can learn from distributed datasets across clients without requiring them to share data. In FL, sub-optimal convergence is common among data from different health centers due to the variety in data collection protocols and patient demographics across centers. We propose FedImpres which alleviates catastrophic forgetting by restoring synthetic data that represents the global information as federated impression.
arXiv Detail & Related papers (2024-09-11T15:37:52Z)
Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data. Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z)
Cross-feature Contrastive Loss for Decentralized Deep Learning on Heterogeneous Data [8.946847190099206]
We present a novel approach for decentralized learning on heterogeneous data. Cross-features for a pair of neighboring agents are the features obtained from the data of an agent with respect to the model parameters of the other agent. Our experiments show that the proposed method achieves superior performance (0.2-4% improvement in test accuracy) compared to other existing techniques for decentralized learning on heterogeneous data.
arXiv Detail & Related papers (2023-10-24T14:48:23Z)
FedSym: Unleashing the Power of Entropy for Benchmarking the Algorithms for Federated Learning [1.4656078321003647]
Federated learning (FL) is a decentralized machine learning approach where independent learners process data privately. We study the currently popular data partitioning techniques and visualize their main disadvantages. We propose a method that leverages entropy and symmetry to construct 'the most challenging' and controllable data distributions.
arXiv Detail & Related papers (2023-10-11T18:39:08Z)
Tackling Diverse Minorities in Imbalanced Classification [80.78227787608714]
Imbalanced datasets are commonly observed in various real-world applications, presenting significant challenges in training classifiers. We propose generating synthetic samples iteratively by mixing data samples from both minority and majority classes. We demonstrate the effectiveness of our proposed framework through extensive experiments conducted on seven publicly available benchmark datasets.
arXiv Detail & Related papers (2023-08-28T18:48:34Z)
DRFLM: Distributionally Robust Federated Learning with Inter-client Noise via Local Mixup [58.894901088797376]
federated learning has emerged as a promising approach for training a global model using data from multiple organizations without leaking their raw data. We propose a general framework to solve the above two challenges simultaneously. We provide comprehensive theoretical analysis including robustness analysis, convergence analysis, and generalization ability.
arXiv Detail & Related papers (2022-04-16T08:08:29Z)
Non-IID data and Continual Learning processes in Federated Learning: A long road ahead [58.720142291102135]
Federated Learning is a novel framework that allows multiple devices or institutions to train a machine learning model collaboratively while preserving their data private. In this work, we formally classify data statistical heterogeneity and review the most remarkable learning strategies that are able to face it. At the same time, we introduce approaches from other machine learning frameworks, such as Continual Learning, that also deal with data heterogeneity and could be easily adapted to the Federated Learning settings.
arXiv Detail & Related papers (2021-11-26T09:57:11Z)
FedSLD: Federated Learning with Shared Label Distribution for Medical Image Classification [6.0088002781256185]
We propose Federated Learning with Shared Label Distribution (FedSLD) for classification tasks. FedSLD adjusts the contribution of each data sample to the local objective during optimization given knowledge of the distribution. Our results show that FedSLD achieves better convergence performance than the compared leading FL optimization algorithms.
arXiv Detail & Related papers (2021-10-15T21:38:25Z)
Learning Bias-Invariant Representation by Cross-Sample Mutual Information Minimization [77.8735802150511]
We propose a cross-sample adversarial debiasing (CSAD) method to remove the bias information misused by the target task. The correlation measurement plays a critical role in adversarial debiasing and is conducted by a cross-sample neural mutual information estimator. We conduct thorough experiments on publicly available datasets to validate the advantages of the proposed method over state-of-the-art approaches.
arXiv Detail & Related papers (2021-08-11T21:17:02Z)
Handling Data Heterogeneity with Generative Replay in Collaborative Learning for Medical Imaging [21.53220262343254]
We present a novel generative replay strategy to address the challenge of data heterogeneity in collaborative learning methods. A primary model learns the desired task, and an auxiliary "generative replay model" either synthesizes images that closely resemble the input images or helps extract latent variables. The generative replay strategy is flexible to use, can either be incorporated into existing collaborative learning methods to improve their capability of handling data heterogeneity across institutions, or be used as a novel and individual collaborative learning framework (termed FedReplay) to reduce communication cost.
arXiv Detail & Related papers (2021-06-24T17:39:55Z)
On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation. We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.