The Impact of Data Distribution on Fairness and Robustness in Federated
Learning
- URL: http://arxiv.org/abs/2112.01274v1
- Date: Mon, 29 Nov 2021 22:04:50 GMT
- Title: The Impact of Data Distribution on Fairness and Robustness in Federated
Learning
- Authors: Mustafa Safa Ozdayi and Murat Kantarcioglu
- Abstract summary: Federated Learning (FL) is a distributed machine learning protocol that allows a set of agents to collaboratively train a model without sharing their datasets.
In this work, we look at how variations in local data distributions affect the fairness and the properties of the trained models.
- Score: 7.209240273742224
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Learning (FL) is a distributed machine learning protocol that
allows a set of agents to collaboratively train a model without sharing their
datasets. This makes FL particularly suitable for settings where data privacy
is desired. However, it has been observed that the performance of FL is closely
related to the similarity of the local data distributions of agents.
Particularly, as the data distributions of agents differ, the accuracy of the
trained models drop. In this work, we look at how variations in local data
distributions affect the fairness and the robustness properties of the trained
models in addition to the accuracy. Our experimental results indicate that, the
trained models exhibit higher bias, and become more susceptible to attacks as
local data distributions differ. Importantly, the degradation in the fairness,
and robustness can be much more severe than the accuracy. Therefore, we reveal
that small variations that have little impact on the accuracy could still be
important if the trained model is to be deployed in a fairness/security
critical context.
Related papers
- FedDistill: Global Model Distillation for Local Model De-Biasing in Non-IID Federated Learning [10.641875933652647]
Federated Learning (FL) is a novel approach that allows for collaborative machine learning.
FL faces challenges due to non-uniformly distributed (non-iid) data across clients.
This paper introduces FedDistill, a framework enhancing the knowledge transfer from the global model to local models.
arXiv Detail & Related papers (2024-04-14T10:23:30Z) - Federated Skewed Label Learning with Logits Fusion [23.062650578266837]
Federated learning (FL) aims to collaboratively train a shared model across multiple clients without transmitting their local data.
We propose FedBalance, which corrects the optimization bias among local models by calibrating their logits.
Our method can gain 13% higher average accuracy compared with state-of-the-art methods.
arXiv Detail & Related papers (2023-11-14T14:37:33Z) - Class-Balancing Diffusion Models [57.38599989220613]
Class-Balancing Diffusion Models (CBDM) are trained with a distribution adjustment regularizer as a solution.
Our method benchmarked the generation results on CIFAR100/CIFAR100LT dataset and shows outstanding performance on the downstream recognition task.
arXiv Detail & Related papers (2023-04-30T20:00:14Z) - Chasing Fairness Under Distribution Shift: A Model Weight Perturbation
Approach [72.19525160912943]
We first theoretically demonstrate the inherent connection between distribution shift, data perturbation, and model weight perturbation.
We then analyze the sufficient conditions to guarantee fairness for the target dataset.
Motivated by these sufficient conditions, we propose robust fairness regularization (RFR)
arXiv Detail & Related papers (2023-03-06T17:19:23Z) - Striving for data-model efficiency: Identifying data externalities on
group performance [75.17591306911015]
Building trustworthy, effective, and responsible machine learning systems hinges on understanding how differences in training data and modeling decisions interact to impact predictive performance.
We focus on a particular type of data-model inefficiency, in which adding training data from some sources can actually lower performance evaluated on key sub-groups of the population.
Our results indicate that data-efficiency is a key component of both accurate and trustworthy machine learning.
arXiv Detail & Related papers (2022-11-11T16:48:27Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Accuracy on the Line: On the Strong Correlation Between
Out-of-Distribution and In-Distribution Generalization [89.73665256847858]
We show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts.
Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet.
We also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS.
arXiv Detail & Related papers (2021-07-09T19:48:23Z) - Predicting with Confidence on Unseen Distributions [90.68414180153897]
We connect domain adaptation and predictive uncertainty literature to predict model accuracy on challenging unseen distributions.
We find that the difference of confidences (DoC) of a classifier's predictions successfully estimates the classifier's performance change over a variety of shifts.
We specifically investigate the distinction between synthetic and natural distribution shifts and observe that despite its simplicity DoC consistently outperforms other quantifications of distributional difference.
arXiv Detail & Related papers (2021-07-07T15:50:18Z) - Improving Accuracy of Federated Learning in Non-IID Settings [11.908715869667445]
Federated Learning (FL) is a decentralized machine learning protocol that allows a set of participating agents to collaboratively train a model without sharing their data.
It has been observed that the performance of FL is closely tied with the local data distributions of agents.
In this work, we identify four simple techniques that can improve the performance of trained models without incurring any additional communication overhead to FL.
arXiv Detail & Related papers (2020-10-14T21:02:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.