An Interactive Framework for Implementing Privacy-Preserving Federated Learning: Experiments on Large Language Models
- URL: http://arxiv.org/abs/2502.08008v2
- Date: Fri, 14 Feb 2025 18:52:34 GMT
- Title: An Interactive Framework for Implementing Privacy-Preserving Federated Learning: Experiments on Large Language Models
- Authors: Kasra Ahmadi, Rouzbeh Behnia, Reza Ebrahimi, Mehran Mozaffari Kermani, Jeremiah Birrell, Jason Pacheco, Attila A Yavuz,
- Abstract summary: Federated learning (FL) enhances privacy by keeping user data on local devices.
Recent attacks have demonstrated that updates shared by users during training can reveal significant information about their data.
We propose a framework that integrates a human entity as a privacy practitioner to determine an optimal trade-off between the model's privacy and utility.
- Score: 7.539653242367701
- License:
- Abstract: Federated learning (FL) enhances privacy by keeping user data on local devices. However, emerging attacks have demonstrated that the updates shared by users during training can reveal significant information about their data. This has greatly thwart the adoption of FL methods for training robust AI models in sensitive applications. Differential Privacy (DP) is considered the gold standard for safeguarding user data. However, DP guarantees are highly conservative, providing worst-case privacy guarantees. This can result in overestimating privacy needs, which may compromise the model's accuracy. Additionally, interpretations of these privacy guarantees have proven to be challenging in different contexts. This is further exacerbated when other factors, such as the number of training iterations, data distribution, and specific application requirements, can add further complexity to this problem. In this work, we proposed a framework that integrates a human entity as a privacy practitioner to determine an optimal trade-off between the model's privacy and utility. Our framework is the first to address the variable memory requirement of existing DP methods in FL settings, where resource-limited devices (e.g., cell phones) can participate. To support such settings, we adopt a recent DP method with fixed memory usage to ensure scalable private FL. We evaluated our proposed framework by fine-tuning a BERT-based LLM model using the GLUE dataset (a common approach in literature), leveraging the new accountant, and employing diverse data partitioning strategies to mimic real-world conditions. As a result, we achieved stable memory usage, with an average accuracy reduction of 1.33% for $\epsilon = 10$ and 1.9% for $\epsilon = 6$, when compared to the state-of-the-art DP accountant which does not support fixed memory usage.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - DP$^2$-FedSAM: Enhancing Differentially Private Federated Learning Through Personalized Sharpness-Aware Minimization [8.022417295372492]
Federated learning (FL) is a distributed machine learning approach that allows multiple clients to collaboratively train a model without sharing their raw data.
To prevent sensitive information from being inferred through the model updates shared in FL, differentially private federated learning (DPFL) has been proposed.
DPFL ensures formal and rigorous privacy protection in FL by clipping and adding random noise to the shared model updates.
We propose DP$2$-FedSAM: Differentially Private and Personalized Federated Learning with Sharpness-Aware Minimization.
arXiv Detail & Related papers (2024-09-20T16:49:01Z) - PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data.
The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates.
We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z) - Mind the Privacy Unit! User-Level Differential Privacy for Language Model Fine-Tuning [62.224804688233]
differential privacy (DP) offers a promising solution by ensuring models are 'almost indistinguishable' with or without any particular privacy unit.
We study user-level DP motivated by applications where it necessary to ensure uniform privacy protection across users.
arXiv Detail & Related papers (2024-06-20T13:54:32Z) - Privacy Amplification for the Gaussian Mechanism via Bounded Support [64.86780616066575]
Data-dependent privacy accounting frameworks such as per-instance differential privacy (pDP) and Fisher information loss (FIL) confer fine-grained privacy guarantees for individuals in a fixed training dataset.
We propose simple modifications of the Gaussian mechanism with bounded support, showing that they amplify privacy guarantees under data-dependent accounting.
arXiv Detail & Related papers (2024-03-07T21:22:07Z) - Probing the Transition to Dataset-Level Privacy in ML Models Using an
Output-Specific and Data-Resolved Privacy Profile [23.05994842923702]
We study a privacy metric that quantifies the extent to which a model trained on a dataset using a Differential Privacy mechanism is covered" by each of the distributions resulting from training on neighboring datasets.
We show that the privacy profile can be used to probe an observed transition to indistinguishability that takes place in the neighboring distributions as $epsilon$ decreases.
arXiv Detail & Related papers (2023-06-27T20:39:07Z) - FedLAP-DP: Federated Learning by Sharing Differentially Private Loss Approximations [53.268801169075836]
We propose FedLAP-DP, a novel privacy-preserving approach for federated learning.
A formal privacy analysis demonstrates that FedLAP-DP incurs the same privacy costs as typical gradient-sharing schemes.
Our approach presents a faster convergence speed compared to typical gradient-sharing methods.
arXiv Detail & Related papers (2023-02-02T12:56:46Z) - How Much Privacy Does Federated Learning with Secure Aggregation
Guarantee? [22.7443077369789]
Federated learning (FL) has attracted growing interest for enabling privacy-preserving machine learning on data stored at multiple users.
While data never leaves users' devices, privacy still cannot be guaranteed since significant computations on users' training data are shared in the form of trained local models.
Secure Aggregation (SA) has been developed as a framework to preserve privacy in FL.
arXiv Detail & Related papers (2022-08-03T18:44:17Z) - Federated Learning with Local Differential Privacy: Trade-offs between
Privacy, Utility, and Communication [22.171647103023773]
Federated learning (FL) allows to train a massive amount of data privately due to its decentralized structure.
We consider Gaussian mechanisms to preserve local differential privacy (LDP) of user data in the FL model with SGD.
Our results guarantee a significantly larger utility and a smaller transmission rate as compared to existing privacy accounting methods.
arXiv Detail & Related papers (2021-02-09T10:04:18Z) - Federated Learning with Sparsification-Amplified Privacy and Adaptive
Optimization [27.243322019117144]
Federated learning (FL) enables distributed agents to collaboratively learn a centralized model without sharing their raw data with each other.
We propose a new FL framework with sparsification-amplified privacy.
Our approach integrates random sparsification with gradient perturbation on each agent to amplify privacy guarantee.
arXiv Detail & Related papers (2020-08-01T20:22:57Z) - Differentially Private Federated Learning with Laplacian Smoothing [72.85272874099644]
Federated learning aims to protect data privacy by collaboratively learning a model without sharing private data among users.
An adversary may still be able to infer the private training data by attacking the released model.
Differential privacy provides a statistical protection against such attacks at the price of significantly degrading the accuracy or utility of the trained models.
arXiv Detail & Related papers (2020-05-01T04:28:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.