DEPN: Detecting and Editing Privacy Neurons in Pretrained Language
Models
- URL: http://arxiv.org/abs/2310.20138v2
- Date: Tue, 5 Dec 2023 16:14:24 GMT
- Title: DEPN: Detecting and Editing Privacy Neurons in Pretrained Language
Models
- Authors: Xinwei Wu, Junzhuo Li, Minghui Xu, Weilong Dong, Shuangzhi Wu, Chao
Bian, Deyi Xiong
- Abstract summary: Large language models pretrained on a huge amount of data capture rich knowledge and information in the training data.
The ability of data memorization and regurgitation in pretrained language models, revealed in previous studies, brings the risk of data leakage.
We propose a framework DEPN to Detect and Edit Privacy Neurons in pretrained language models.
- Score: 46.04803661300974
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Large language models pretrained on a huge amount of data capture rich
knowledge and information in the training data. The ability of data
memorization and regurgitation in pretrained language models, revealed in
previous studies, brings the risk of data leakage. In order to effectively
reduce these risks, we propose a framework DEPN to Detect and Edit Privacy
Neurons in pretrained language models, partially inspired by knowledge neurons
and model editing. In DEPN, we introduce a novel method, termed as privacy
neuron detector, to locate neurons associated with private information, and
then edit these detected privacy neurons by setting their activations to zero.
Furthermore, we propose a privacy neuron aggregator dememorize private
information in a batch processing manner. Experimental results show that our
method can significantly and efficiently reduce the exposure of private data
leakage without deteriorating the performance of the model. Additionally, we
empirically demonstrate the relationship between model memorization and privacy
neurons, from multiple perspectives, including model size, training time,
prompts, privacy neuron distribution, illustrating the robustness of our
approach.
Related papers
- Pseudo-Probability Unlearning: Towards Efficient and Privacy-Preserving Machine Unlearning [59.29849532966454]
We propose PseudoProbability Unlearning (PPU), a novel method that enables models to forget data to adhere to privacy-preserving manner.
Our method achieves over 20% improvements in forgetting error compared to the state-of-the-art.
arXiv Detail & Related papers (2024-11-04T21:27:06Z) - Privacy Backdoors: Enhancing Membership Inference through Poisoning Pre-trained Models [112.48136829374741]
In this paper, we unveil a new vulnerability: the privacy backdoor attack.
When a victim fine-tunes a backdoored model, their training data will be leaked at a significantly higher rate than if they had fine-tuned a typical model.
Our findings highlight a critical privacy concern within the machine learning community and call for a reevaluation of safety protocols in the use of open-source pre-trained models.
arXiv Detail & Related papers (2024-04-01T16:50:54Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - Mitigating Data Scarcity for Large Language Models [7.259279261659759]
In recent years, pretrained neural language models (PNLMs) have taken the field of natural language processing by storm.
Data scarcity are commonly found in specialized domains, such as medical, or in low-resource languages that are underexplored by AI research.
In this dissertation, we focus on mitigating data scarcity using data augmentation and neural ensemble learning techniques.
arXiv Detail & Related papers (2023-02-03T15:17:53Z) - Reconstructing Training Data from Model Gradient, Provably [68.21082086264555]
We reconstruct the training samples from a single gradient query at a randomly chosen parameter value.
As a provable attack that reveals sensitive training data, our findings suggest potential severe threats to privacy.
arXiv Detail & Related papers (2022-12-07T15:32:22Z) - Secure & Private Federated Neuroimaging [17.946206585229675]
Federated Learning enables distributed training of neural network models over multiple data sources without sharing data.
Each site trains the neural network over its private data for some time, then shares the neural network parameters with a Federation Controller.
Our Federated Learning architecture, MetisFL, provides strong security and privacy.
arXiv Detail & Related papers (2022-05-11T03:36:04Z) - Measuring Unintended Memorisation of Unique Private Features in Neural
Networks [15.174895411434026]
We show that neural networks unintentionally memorise unique features even when they occur only once in training data.
An example of a unique feature is a person's name that is accidentally present on a training image.
arXiv Detail & Related papers (2022-02-16T14:39:05Z) - Training Data Leakage Analysis in Language Models [6.843491191969066]
We introduce a methodology that investigates identifying the user content in the training data that could be leaked under a strong and realistic threat model.
We propose two metrics to quantify user-level data leakage by measuring a model's ability to produce unique sentence fragments within training data.
arXiv Detail & Related papers (2021-01-14T00:57:32Z) - Robustness Threats of Differential Privacy [70.818129585404]
We experimentally demonstrate that networks, trained with differential privacy, in some settings might be even more vulnerable in comparison to non-private versions.
We study how the main ingredients of differentially private neural networks training, such as gradient clipping and noise addition, affect the robustness of the model.
arXiv Detail & Related papers (2020-12-14T18:59:24Z) - Learning identifiable and interpretable latent models of
high-dimensional neural activity using pi-VAE [10.529943544385585]
We propose a method that integrates key ingredients from latent models and traditional neural encoding models.
Our method, pi-VAE, is inspired by recent progress on identifiable variational auto-encoder.
We validate pi-VAE using synthetic data, and apply it to analyze neurophysiological datasets from rat hippocampus and macaque motor cortex.
arXiv Detail & Related papers (2020-11-09T22:00:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.