Related papers: Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning

Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning

URL: http://arxiv.org/abs/2409.09532v2
Date: Thu, 23 Jan 2025 14:39:05 GMT
Title: Using Synthetic Data to Mitigate Unfairness and Preserve Privacy in Collaborative Machine Learning
Authors: Chia-Yuan Wu, Frank E. Curtis, Daniel P. Robinson,
Abstract summary: Collaborative machine learning enables multiple clients to train a global model collaboratively.<n>To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters.<n>We propose a two-stage strategy that promotes fair predictions, prevents client-data leakage, and reduces communication costs.
Score: 6.516872951510096
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In distributed computing environments, collaborative machine learning enables multiple clients to train a global model collaboratively. To preserve privacy in such settings, a common technique is to utilize frequent updates and transmissions of model parameters. However, this results in high communication costs between the clients and the server. To tackle unfairness concerns in distributed environments, client-specific information (e.g., local dataset size or data-related fairness metrics) must be sent to the server to compute algorithmic quantities (e.g., aggregation weights), which leads to a potential leakage of client information. To address these challenges, we propose a two-stage strategy that promotes fair predictions, prevents client-data leakage, and reduces communication costs in certain scenarios without the need to pass information between clients and server iteratively. In the first stage, for each client, we use its local dataset to obtain a synthetic dataset by solving a bilevel optimization problem that aims to ensure that the ultimate global model yields fair predictions. In the second stage, we apply a method with differential privacy guarantees to the synthetic dataset from the first stage to obtain a second synthetic data. We then pass each client's second-stage synthetic dataset to the server, the collection of which is used to train the server model using conventional machine learning techniques (that no longer need to take fairness metrics or privacy into account). Thus, we eliminate the need to handle fairness-specific aggregation weights while preserving client privacy. Our approach requires only a single communication between the clients and the server (thus making it communication cost-effective), maintains data privacy, and promotes fairness. We present empirical evidence to demonstrate the advantages of our approach.

Related papers

ACCESS-FL: Agile Communication and Computation for Efficient Secure Aggregation in Stable Federated Learning Networks [26.002975401820887]
Federated Learning (FL) is a distributed learning framework designed for privacy-aware applications. Traditional FL approaches risk exposing sensitive client data when plain model updates are transmitted to the server. Google's Secure Aggregation (SecAgg) protocol addresses this threat by employing a double-masking technique. We propose ACCESS-FL, a communication-and-computation-efficient secure aggregation method.
arXiv Detail & Related papers (2024-09-03T09:03:38Z)
Fair Federated Data Clustering through Personalization: Bridging the Gap between Diverse Data Distributions [2.7905216619150344]
We introduce the idea of personalization in federated clustering. The goal is achieve balance between achieving lower clustering cost and at same time achieving uniform cost across clients. We propose p-FClus that addresses these goal in a single round of communication between server and clients.
arXiv Detail & Related papers (2024-07-05T07:10:26Z)
Personalized federated learning based on feature fusion [2.943623084019036]
Federated learning enables distributed clients to collaborate on training while storing their data locally to protect client privacy. We propose a personalized federated learning approach called pFedPM. In our process, we replace traditional gradient uploading with feature uploading, which helps reduce communication costs and allows for heterogeneous client models.
arXiv Detail & Related papers (2024-06-24T12:16:51Z)
Efficient Cross-Domain Federated Learning by MixStyle Approximation [0.3277163122167433]
We introduce a privacy-preserving, resource-efficient Federated Learning concept for client adaptation in hardware-constrained environments. Our approach includes server model pre-training on source data and subsequent fine-tuning on target data via low-end clients. Preliminary results indicate that our method reduces computational and transmission costs while maintaining competitive performance on downstream tasks.
arXiv Detail & Related papers (2023-12-12T08:33:34Z)
FedBayes: A Zero-Trust Federated Learning Aggregation to Defend Against Adversarial Attacks [1.689369173057502]
Federated learning has created a decentralized method to train a machine learning model without needing direct access to client data. malicious clients are able to corrupt the global model and degrade performance across all clients within a federation. Our novel aggregation method, FedBayes, mitigates the effect of a malicious client by calculating the probabilities of a client's model weights.
arXiv Detail & Related papers (2023-12-04T21:37:50Z)
Utilizing Free Clients in Federated Learning for Focused Model Enhancement [9.370655190768163]
Federated Learning (FL) is a distributed machine learning approach to learn models on decentralized heterogeneous data. We present FedALIGN (Federated Adaptive Learning with Inclusion of Global Needs) to address this challenge.
arXiv Detail & Related papers (2023-10-06T18:23:40Z)
Client-specific Property Inference against Secure Aggregation in Federated Learning [52.8564467292226]
Federated learning has become a widely used paradigm for collaboratively training a common model among different participants. Many attacks have shown that it is still possible to infer sensitive information such as membership, property, or outright reconstruction of participant data. We show that simple linear models can effectively capture client-specific properties only from the aggregated model updates.
arXiv Detail & Related papers (2023-03-07T14:11:01Z)
DYNAFED: Tackling Client Data Heterogeneity with Global Dynamics [60.60173139258481]
Local training on non-iid distributed data results in deflected local optimum. A natural solution is to gather all client data onto the server, such that the server has a global view of the entire data distribution. In this paper, we put forth an idea to collect and leverage global knowledge on the server without hindering data privacy.
arXiv Detail & Related papers (2022-11-20T06:13:06Z)
Optimizing Server-side Aggregation For Robust Federated Learning via Subspace Training [80.03567604524268]
Non-IID data distribution across clients and poisoning attacks are two main challenges in real-world federated learning systems. We propose SmartFL, a generic approach that optimize the server-side aggregation process. We provide theoretical analyses of the convergence and generalization capacity for SmartFL.
arXiv Detail & Related papers (2022-11-10T13:20:56Z)
Straggler-Resilient Personalized Federated Learning [55.54344312542944]
Federated learning allows training models from samples distributed across a large network of clients while respecting privacy and communication restrictions. We develop a novel algorithmic procedure with theoretical speedup guarantees that simultaneously handles two of these hurdles. Our method relies on ideas from representation learning theory to find a global common representation using all clients' data and learn a user-specific set of parameters leading to a personalized solution for each client.
arXiv Detail & Related papers (2022-06-05T01:14:46Z)
Federated Multi-Target Domain Adaptation [99.93375364579484]
Federated learning methods enable us to train machine learning models on distributed user data while preserving its privacy. We consider a more practical scenario where the distributed client data is unlabeled, and a centralized labeled dataset is available on the server. We propose an effective DualAdapt method to address the new challenges.
arXiv Detail & Related papers (2021-08-17T17:53:05Z)
Towards Fair Federated Learning with Zero-Shot Data Augmentation [123.37082242750866]
Federated learning has emerged as an important distributed learning paradigm, where a server aggregates a global model from many client-trained models while having no access to the client data. We propose a novel federated learning system that employs zero-shot data augmentation on under-represented data to mitigate statistical heterogeneity and encourage more uniform accuracy performance across clients in federated networks. We study two variants of this scheme, Fed-ZDAC (federated learning with zero-shot data augmentation at the clients) and Fed-ZDAS (federated learning with zero-shot data augmentation at the server).
arXiv Detail & Related papers (2021-04-27T18:23:54Z)
Toward Understanding the Influence of Individual Clients in Federated Learning [52.07734799278535]
Federated learning allows clients to jointly train a global model without sending their private data to a central server. We defined a new notion called em-Influence, quantify this influence over parameters, and proposed an effective efficient model to estimate this metric.
arXiv Detail & Related papers (2020-12-20T14:34:36Z)
Differentially Private Secure Multi-Party Computation for Federated Learning in Financial Applications [5.50791468454604]
Federated learning enables a population of clients, working with a trusted server, to collaboratively learn a shared machine learning model. This reduces the risk of exposing sensitive data, but it is still possible to reverse engineer information about a client's private data set from communicated model parameters. We present a privacy-preserving federated learning protocol to a non-specialist audience, demonstrate it using logistic regression on a real-world credit card fraud data set, and evaluate it using an open-source simulation platform.
arXiv Detail & Related papers (2020-10-12T17:16:27Z)
Shuffled Model of Federated Learning: Privacy, Communication and Accuracy Trade-offs [30.58690911428577]
We consider a distributed empirical risk minimization (ERM) optimization problem with communication efficiency and privacy requirements. We develop (optimal) communication-efficient schemes for private mean estimation for several $ell_p$ spaces. We demonstrate that one can get the same privacy, optimization-performance operating point developed in recent methods that use full-precision communication.
arXiv Detail & Related papers (2020-08-17T09:41:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.