Analysis of Privacy Leakage in Federated Large Language Models
- URL: http://arxiv.org/abs/2403.04784v1
- Date: Sat, 2 Mar 2024 20:25:38 GMT
- Title: Analysis of Privacy Leakage in Federated Large Language Models
- Authors: Minh N. Vu, Truc Nguyen, Tre' R. Jeter, My T. Thai,
- Abstract summary: We study the privacy analysis of Federated Learning (FL) when used for training Large Language Models (LLMs)
In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations.
Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs.
- Score: 18.332535398635027
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the rapid adoption of Federated Learning (FL) as the training and tuning protocol for applications utilizing Large Language Models (LLMs), recent research highlights the need for significant modifications to FL to accommodate the large-scale of LLMs. While substantial adjustments to the protocol have been introduced as a response, comprehensive privacy analysis for the adapted FL protocol is currently lacking. To address this gap, our work delves into an extensive examination of the privacy analysis of FL when used for training LLMs, both from theoretical and practical perspectives. In particular, we design two active membership inference attacks with guaranteed theoretical success rates to assess the privacy leakages of various adapted FL configurations. Our theoretical findings are translated into practical attacks, revealing substantial privacy vulnerabilities in popular LLMs, including BERT, RoBERTa, DistilBERT, and OpenAI's GPTs, across multiple real-world language datasets. Additionally, we conduct thorough experiments to evaluate the privacy leakage of these models when data is protected by state-of-the-art differential privacy (DP) mechanisms.
Related papers
- PriRoAgg: Achieving Robust Model Aggregation with Minimum Privacy Leakage for Federated Learning [49.916365792036636]
Federated learning (FL) has recently gained significant momentum due to its potential to leverage large-scale distributed user data.
The transmitted model updates can potentially leak sensitive user information, and the lack of central control of the local training process leaves the global model susceptible to malicious manipulations on model updates.
We develop a general framework PriRoAgg, utilizing Lagrange coded computing and distributed zero-knowledge proof, to execute a wide range of robust aggregation algorithms while satisfying aggregated privacy.
arXiv Detail & Related papers (2024-07-12T03:18:08Z) - Locally Differentially Private In-Context Learning [8.659575019965152]
Large pretrained language models (LLMs) have shown surprising In-Context Learning (ICL) ability.
This paper proposes a locally differentially private framework of in-context learning (LDP-ICL)
Considering the mechanisms of in-context learning in Transformers by gradient descent, we provide an analysis of the trade-off between privacy and utility in such LDP-ICL.
arXiv Detail & Related papers (2024-05-07T06:05:43Z) - Federated Learning: A Cutting-Edge Survey of the Latest Advancements and Applications [6.042202852003457]
Federated learning (FL) is a technique for developing robust machine learning (ML) models.
To protect user privacy, FL requires users to send model updates rather than transmitting large quantities of raw and potentially confidential data.
This survey provides a comprehensive analysis and comparison of the most recent FL algorithms.
arXiv Detail & Related papers (2023-10-08T19:54:26Z) - Large Language Models Can Be Good Privacy Protection Learners [53.07930843882592]
We introduce Privacy Protection Language Models (PPLM), a novel paradigm for fine-tuning language models.
Our work offers a theoretical analysis for model design and delves into various techniques such as corpus curation, penalty-based unlikelihood in training loss, and instruction-based tuning.
In particular, instruction tuning with both positive and negative examples, stands out as a promising method, effectively protecting private data while enhancing the model's knowledge.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - FederatedScope-LLM: A Comprehensive Package for Fine-tuning Large
Language Models in Federated Learning [70.38817963253034]
This paper first discusses these challenges of federated fine-tuning LLMs, and introduces our package FS-LLM as a main contribution.
We provide comprehensive federated parameter-efficient fine-tuning algorithm implementations and versatile programming interfaces for future extension in FL scenarios.
We conduct extensive experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-the-art parameter-efficient fine-tuning algorithms in FL settings.
arXiv Detail & Related papers (2023-09-01T09:40:36Z) - Towards Building the Federated GPT: Federated Instruction Tuning [66.7900343035733]
This paper introduces Federated Instruction Tuning (FedIT) as the learning framework for the instruction tuning of large language models (LLMs)
We demonstrate that by exploiting the heterogeneous and diverse sets of instructions on the client's end with FedIT, we improved the performance of LLMs compared to centralized training with only limited local instructions.
arXiv Detail & Related papers (2023-05-09T17:42:34Z) - Towards Interpretable Federated Learning [19.764172768506132]
Federated learning (FL) enables multiple data owners to build machine learning models collaboratively without exposing their private local data.
It is important to balance the need for performance, privacy-preservation and interpretability, especially in mission critical applications such as finance and healthcare.
We conduct comprehensive analysis of the representative IFL approaches, the commonly adopted performance evaluation metrics, and promising directions towards building versatile IFL techniques.
arXiv Detail & Related papers (2023-02-27T02:06:18Z) - When Federated Learning Meets Pre-trained Language Models'
Parameter-Efficient Tuning Methods [22.16636947999123]
We introduce various parameter-efficient tuning (PETuning) methods into federated learning.
Specifically, we provide a holistic empirical study of representative PLMs tuning methods in FL.
Overall communication overhead can be significantly reduced by locally tuning and globally aggregating lightweight model parameters.
arXiv Detail & Related papers (2022-12-20T06:44:32Z) - Unraveling the Connections between Privacy and Certified Robustness in
Federated Learning Against Poisoning Attacks [68.20436971825941]
Federated learning (FL) provides an efficient paradigm to jointly train a global model leveraging data from distributed users.
Several studies have shown that FL is vulnerable to poisoning attacks.
To protect the privacy of local users, FL is usually trained in a differentially private way.
arXiv Detail & Related papers (2022-09-08T21:01:42Z) - Do Gradient Inversion Attacks Make Federated Learning Unsafe? [70.0231254112197]
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
Recent works on the inversion of deep neural networks from model gradients raised concerns about the security of FL in preventing the leakage of training data.
In this work, we show that these attacks presented in the literature are impractical in real FL use-cases and provide a new baseline attack.
arXiv Detail & Related papers (2022-02-14T18:33:12Z) - Understanding the Interplay between Privacy and Robustness in Federated
Learning [15.673448030003788]
Federated Learning (FL) is emerging as a promising paradigm of privacy-preserving machine learning.
Recent works highlighted several privacy and robustness weaknesses in FL.
It is still not clear how LDP affects adversarial robustness in FL.
arXiv Detail & Related papers (2021-06-13T16:01:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.