Analysis of Privacy Leakage in Federated Large Language Models
- URL: http://arxiv.org/abs/2403.04784v1
- Date: Sat, 2 Mar 2024 20:25:38 GMT
- Title: Analysis of Privacy Leakage in Federated Large Language Models
- Authors: Minh N. Vu, Truc Nguyen, Tre' R. Jeter, My T. Thai,
- Abstract summary: We study the privacy analysis of Federated Learning (FL) when used for training Large Language Models (LLMs)
In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations.
Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs.
- Score: 18.332535398635027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is currently lacking. To address this gap, our work delves into an extensive examination of the privacy analysis of FL when used for training LLMs, both from theoretical and practical perspectives. In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations. Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets. Additionally, we conduct thorough experiments to evaluate the privacy leakage of these models when data is protected by state-of-the-art differential privacy (DP) mechanisms.
Related papers
- Privacy Attack in Federated Learning is Not Easy: An Experimental Study [5.065947993017158]
Federated learning (FL) is an emerging distributed machine learning paradigm proposed for privacy preservation.
Recent studies have indicated that FL cannot entirely guarantee privacy protection.
It remains uncertain whether privacy attack FL algorithms are effective in realistic federated environments.
arXiv Detail & Related papers (2024-09-28T10:06:34Z) - Re-Evaluating Privacy in Centralized and Decentralized Learning: An Information-Theoretical and Empirical Study [4.7773230870500605]
Decentralized Federated Learning (DFL) has garnered attention for its robustness and scalability.
Recent work by Pasquini et. al. challenges this view, demonstrating that DFL does not inherently improve privacy against empirical attacks.
arXiv Detail & Related papers (2024-09-21T23:05:50Z) - Convergent Differential Privacy Analysis for General Federated Learning: the $f$-DP Perspective [57.35402286842029]
Federated learning (FL) is an efficient collaborative training paradigm with a focus on local privacy.
differential privacy (DP) is a classical approach to capture and ensure the reliability of private protections.
arXiv Detail & Related papers (2024-08-28T08:22:21Z) - LLM-PBE: Assessing Data Privacy in Large Language Models [111.58198436835036]
Large Language Models (LLMs) have become integral to numerous domains, significantly advancing applications in data management, mining, and analysis.
Despite the critical nature of this issue, there has been no existing literature to offer a comprehensive assessment of data privacy risks in LLMs.
Our paper introduces LLM-PBE, a toolkit crafted specifically for the systematic evaluation of data privacy risks in LLMs.
arXiv Detail & Related papers (2024-08-23T01:37:29Z) - PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data.
The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates.
We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z) - Locally Differentially Private In-Context Learning [8.659575019965152]
Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability.
This paper proposes a locally differentially private framework of in-context learning (LDP-ICL)
Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL.
arXiv Detail & Related papers (2024-05-07T06:05:43Z) - Federated Learning with Reduced Information Leakage and Computation [17.069452700698047]
Federated learning (FL) is a distributed learning paradigm that allows multiple decentralized clients to collaboratively learn a common model without sharing local data.
This paper introduces Upcycled-FL, a strategy that applies first-order approximation at every even round of model update.
Under this strategy, half of the FL updates incur no information leakage and require much less computational and transmission costs.
arXiv Detail & Related papers (2023-10-10T06:22:06Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Towards Building the Federated GPT: Federated Instruction Tuning [66.7900343035733]
This paper introduces Federated Instruction Tuning (FedIT) as the learning framework for the instruction tuning of large language models (LLMs)
We demonstrate that by exploiting the heterogeneous and diverse sets of instructions on the client's end with FedIT, we improved the performance of LLMs compared to centralized training with only limited local instructions.
arXiv Detail & Related papers (2023-05-09T17:42:34Z) - Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data.
In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z) - Understanding the Interplay between Privacy and Robustness in Federated
Learning [15.673448030003788]
Federated Learning (FL) is emerging as a promising paradigm of privacy-preserving machine learning.
Recent works highlighted several privacy and robustness weaknesses in FL.
It is still not clear how LDP affects adversarial robustness in FL.
arXiv Detail & Related papers (2021-06-13T16:01:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.