Using Synthetic Data to Mitigate Unfairness and Preserve Privacy through Single-Shot Federated Learning
- URL: http://arxiv.org/abs/2409.09532v1
- Date: Sat, 14 Sep 2024 21:04:11 GMT
- Title: Using Synthetic Data to Mitigate Unfairness and Preserve Privacy through Single-Shot Federated Learning
- Authors: Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson,
- Abstract summary: We propose a strategy that promotes fair predictions across clients without the need to pass information between the clients and server.
We then pass each client's synthetic dataset to the server, the collection of which is used to train the server model.
- Score: 6.516872951510096
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To address unfairness issues in federated learning (FL), contemporary approaches typically use frequent model parameter updates and transmissions between the clients and server. In such a process, client-specific information (e.g., local dataset size or data-related fairness metrics) must be sent to the server to compute, e.g., aggregation weights. All of this results in high transmission costs and the potential leakage of client information. As an alternative, we propose a strategy that promotes fair predictions across clients without the need to pass information between the clients and server iteratively and prevents client data leakage. For each client, we first use their local dataset to obtain a synthetic dataset by solving a bilevel optimization problem that addresses unfairness concerns during the learning process. We then pass each client's synthetic dataset to the server, the collection of which is used to train the server model using conventional machine learning techniques (that do not take fairness metrics into account). Thus, we eliminate the need to handle fairness-specific aggregation weights while preserving client privacy. Our approach requires only a single communication between the clients and the server, thus making it computationally cost-effective, able to maintain privacy, and able to ensuring fairness. We present empirical evidence to demonstrate the advantages of our approach. The results illustrate that our method effectively uses synthetic data as a means to mitigate unfairness and preserve client privacy.
Related papers
- ACCESS-FL: Agile Communication and Computation for Efficient Secure Aggregation in Stable Federated Learning Networks [26.002975401820887]
Federated Learning (FL) is a distributed learning framework designed for privacy-aware applications.
Traditional FL approaches risk exposing sensitive client data when plain model updates are transmitted to the server.
Google's Secure Aggregation (SecAgg) protocol addresses this threat by employing a double-masking technique.
We propose ACCESS-FL, a communication-and-computation-efficient secure aggregation method.
arXiv Detail & Related papers (2024-09-03T09:03:38Z) - FedBayes: A Zero-Trust Federated Learning Aggregation to Defend Against
Adversarial Attacks [1.689369173057502]
Federated learning has created a decentralized method to train a machine learning model without needing direct access to client data.
malicious clients are able to corrupt the global model and degrade performance across all clients within a federation.
Our novel aggregation method, FedBayes, mitigates the effect of a malicious client by calculating the probabilities of a client's model weights.
arXiv Detail & Related papers (2023-12-04T21:37:50Z) - Utilizing Free Clients in Federated Learning for Focused Model
Enhancement [9.370655190768163]
Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data.
We present FedALIGN (Federated Adaptive Learning with Inclusion of Global Needs) to address this challenge.
arXiv Detail & Related papers (2023-10-06T18:23:40Z) - Client-specific Property Inference against Secure Aggregation in
Federated Learning [52.8564467292226]
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants.
Many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data.
We show that simple linear models can effectively capture client-specific properties only from the aggregated model updates.
arXiv Detail & Related papers (2023-03-07T14:11:01Z) - DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics [60.60173139258481]
Local training on non-iid distributed data results in deflected local optimum.
A natural solution is to gather all client data onto the server, such that the server has a global view of the entire data distribution.
In this paper, we put forth an idea to collect and leverage global knowledge on the server without hindering data privacy.
arXiv Detail & Related papers (2022-11-20T06:13:06Z) - Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions.
We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles.
Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z) - Federated Multi-Target Domain Adaptation [99.93375364579484]
Federated learning methods enable us to train machine learning models on distributed user data while preserving its privacy.
We consider a more practical scenario where the distributed client data is unlabeled, and a centralized labeled dataset is available on the server.
We propose an effective DualAdapt method to address the new challenges.
arXiv Detail & Related papers (2021-08-17T17:53:05Z) - Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data.
We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks.
We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z) - Toward Understanding the Influence of Individual Clients in Federated
Learning [52.07734799278535]
Federated learning allows clients to jointly train a global model without sending their private data to a central server.
We defined a new notion called em-Influence, quantify this influence over parameters, and proposed an effective efficient model to estimate this metric.
arXiv Detail & Related papers (2020-12-20T14:34:36Z) - Differentially Private Secure Multi-Party Computation for Federated
Learning in Financial Applications [5.50791468454604]
Federated learning enables a population of clients, working with a trusted server, to collaboratively learn a shared machine learning model.
This reduces the risk of exposing sensitive data, but it is still possible to reverse engineer information about a client's private data set from communicated model parameters.
We present a privacy-preserving federated learning protocol to a non-specialist audience, demonstrate it using logistic regression on a real-world credit card fraud data set, and evaluate it using an open-source simulation platform.
arXiv Detail & Related papers (2020-10-12T17:16:27Z) - Shuffled Model of Federated Learning: Privacy, Communication and
Accuracy Trade-offs [30.58690911428577]
We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements.
We develop (optimal) communication-efficient schemes for private mean estimation for several $ell_p$ spaces.
We demonstrate that one can get the same privacy, optimization-performance operating point developed in recent methods that use full-precision communication.
arXiv Detail & Related papers (2020-08-17T09:41:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.