Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning
- URL: http://arxiv.org/abs/2409.09532v2
- Date: Thu, 23 Jan 2025 14:39:05 GMT
- Title: Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning
- Authors: Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson,
- Abstract summary: Collaborative machine learning enables multiple clients to train a global model collaboratively.
To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters.
We propose a two-stage strategy that promotes fair predictions, prevents client-data leakage, and reduces communication costs.
- Score: 6.516872951510096
- License:
- Abstract: In distributed computing environments, collaborative machine learning enables multiple clients to train a global model collaboratively. To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters. However, this results in high communication costs between the clients and the server. To tackle unfairness concerns in distributed environments, client-specific information (e.g., local dataset size or data-related fairness metrics) must be sent to the server to compute algorithmic quantities (e.g., aggregation weights), which leads to a potential leakage of client information. To address these challenges, we propose a two-stage strategy that promotes fair predictions, prevents client-data leakage, and reduces communication costs in certain scenarios without the need to pass information between clients and server iteratively. In the first stage, for each client, we use its local dataset to obtain a synthetic dataset by solving a bilevel optimization problem that aims to ensure that the ultimate global model yields fair predictions. In the second stage, we apply a method with differential privacy guarantees to the synthetic dataset from the first stage to obtain a second synthetic data. We then pass each client's second-stage synthetic dataset to the server, the collection of which is used to train the server model using conventional machine learning techniques (that no longer need to take fairness metrics or privacy into account). Thus, we eliminate the need to handle fairness-specific aggregation weights while preserving client privacy. Our approach requires only a single communication between the clients and the server (thus making it communication cost-effective), maintains data privacy, and promotes fairness. We present empirical evidence to demonstrate the advantages of our approach.
Related papers
- Fair Federated Data Clustering through Personalization: Bridging the Gap between Diverse Data Distributions [2.7905216619150344]
We introduce the idea of personalization in federated clustering. The goal is achieve balance between achieving lower clustering cost and at same time achieving uniform cost across clients.
We propose p-FClus that addresses these goal in a single round of communication between server and clients.
arXiv Detail & Related papers (2024-07-05T07:10:26Z) - Personalized federated learning based on feature fusion [2.943623084019036]
Federated learning enables distributed clients to collaborate on training while storing their data locally to protect client privacy.
We propose a personalized federated learning approach called pFedPM.
In our process, we replace traditional gradient uploading with feature uploading, which helps reduce communication costs and allows for heterogeneous client models.
arXiv Detail & Related papers (2024-06-24T12:16:51Z) - Efficient Cross-Domain Federated Learning by MixStyle Approximation [0.3277163122167433]
We introduce a privacy-preserving, resource-efficient Federated Learning concept for client adaptation in hardware-constrained environments.
Our approach includes server model pre-training on source data and subsequent fine-tuning on target data via low-end clients.
Preliminary results indicate that our method reduces computational and transmission costs while maintaining competitive performance on downstream tasks.
arXiv Detail & Related papers (2023-12-12T08:33:34Z) - Utilizing Free Clients in Federated Learning for Focused Model
Enhancement [9.370655190768163]
Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data.
We present FedALIGN (Federated Adaptive Learning with Inclusion of Global Needs) to address this challenge.
arXiv Detail & Related papers (2023-10-06T18:23:40Z) - DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics [60.60173139258481]
Local training on non-iid distributed data results in deflected local optimum.
A natural solution is to gather all client data onto the server, such that the server has a global view of the entire data distribution.
In this paper, we put forth an idea to collect and leverage global knowledge on the server without hindering data privacy.
arXiv Detail & Related papers (2022-11-20T06:13:06Z) - Optimizing Server-side Aggregation For Robust Federated Learning via
Subspace Training [80.03567604524268]
Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning systems.
We propose SmartFL, a generic approach that optimize the server-side aggregation process.
We provide theoretical analyses of the convergence and generalization capacity for SmartFL.
arXiv Detail & Related papers (2022-11-10T13:20:56Z) - Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions.
We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles.
Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z) - Federated Multi-Target Domain Adaptation [99.93375364579484]
Federated learning methods enable us to train machine learning models on distributed user data while preserving its privacy.
We consider a more practical scenario where the distributed client data is unlabeled, and a centralized labeled dataset is available on the server.
We propose an effective DualAdapt method to address the new challenges.
arXiv Detail & Related papers (2021-08-17T17:53:05Z) - Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data.
We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks.
We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z) - Toward Understanding the Influence of Individual Clients in Federated
Learning [52.07734799278535]
Federated learning allows clients to jointly train a global model without sending their private data to a central server.
We defined a new notion called em-Influence, quantify this influence over parameters, and proposed an effective efficient model to estimate this metric.
arXiv Detail & Related papers (2020-12-20T14:34:36Z) - Shuffled Model of Federated Learning: Privacy, Communication and
Accuracy Trade-offs [30.58690911428577]
We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements.
We develop (optimal) communication-efficient schemes for private mean estimation for several $ell_p$ spaces.
We demonstrate that one can get the same privacy, optimization-performance operating point developed in recent methods that use full-precision communication.
arXiv Detail & Related papers (2020-08-17T09:41:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.