Learnable Privacy Neurons Localization in Language Models
- URL: http://arxiv.org/abs/2405.10989v1
- Date: Thu, 16 May 2024 08:11:08 GMT
- Title: Learnable Privacy Neurons Localization in Language Models
- Authors: Ruizhe Chen, Tianxiang Hu, Yang Feng, Zuozhu Liu,
- Abstract summary: We introduce a pioneering method for pinpointing PII-sensitive neurons (privacy neurons) within Large Language Models (LLMs)
Our method employs learnable binary weight masks to localize specific neurons that account for the memorization of PII in LLMs through adversarial training.
We propose to validate the potential in PII risk mitigation by deactivating the localized privacy neurons.
- Score: 19.984172475680182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Concerns regarding Large Language Models (LLMs) to memorize and disclose private information, particularly Personally Identifiable Information (PII), become prominent within the community. Many efforts have been made to mitigate the privacy risks. However, the mechanism through which LLMs memorize PII remains poorly understood. To bridge this gap, we introduce a pioneering method for pinpointing PII-sensitive neurons (privacy neurons) within LLMs. Our method employs learnable binary weight masks to localize specific neurons that account for the memorization of PII in LLMs through adversarial training. Our investigations discover that PII is memorized by a small subset of neurons across all layers, which shows the property of PII specificity. Furthermore, we propose to validate the potential in PII risk mitigation by deactivating the localized privacy neurons. Both quantitative and qualitative experiments demonstrate the effectiveness of our neuron localization algorithm.
Related papers
- Analyzing Key Neurons in Large Language Models [14.69046890281591]
Large Language Models (LLMs) possess vast amounts of knowledge within their parameters, prompting research into methods for locating and editing this knowledge.
We introduce Neuron-Inverse Cluster Attribution (NA-ICA), a novel architecture-agnostic framework capable of identifying key neurons in LLMs.
arXiv Detail & Related papers (2024-06-16T09:36:32Z) - Too Big to Fail: Larger Language Models are Disproportionately Resilient to Induction of Dementia-Related Linguistic Anomalies [7.21603206617401]
We show that larger GPT-2 models require a disproportionately larger share of attention heads to be masked/ablated to display degradation magnitude to masking.
These results suggest that the attention mechanism in transformer models may present an analogue to the notions of cognitive and brain reserve.
arXiv Detail & Related papers (2024-06-05T00:31:50Z) - Understanding Privacy Risks of Embeddings Induced by Large Language Models [75.96257812857554]
Large language models show early signs of artificial general intelligence but struggle with hallucinations.
One promising solution is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation.
Recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models.
arXiv Detail & Related papers (2024-04-25T13:10:48Z) - Large Language Models are Parallel Multilingual Learners [50.098518799536144]
In this study, we reveal an in-context learning capability of multilingual large language models (LLMs)
By translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities.
arXiv Detail & Related papers (2024-03-14T03:33:46Z) - Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.
We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs.
Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z) - Unmemorization in Large Language Models via Self-Distillation and
Deliberate Imagination [58.36408867180233]
Large Language Models (LLMs) struggle with crucial issues of privacy violation and unwanted exposure of sensitive data.
We introduce a novel approach termed deliberate imagination in the context of LLM unlearning.
Our results demonstrate the usefulness of this approach across different models and sizes, and also with parameter-efficient fine-tuning.
arXiv Detail & Related papers (2024-02-15T16:21:14Z) - DEPN: Detecting and Editing Privacy Neurons in Pretrained Language
Models [46.04803661300974]
Large language models pretrained on a huge amount of data capture rich knowledge and information in the training data.
The ability of data memorization and regurgitation in pretrained language models, revealed in previous studies, brings the risk of data leakage.
We propose a framework DEPN to Detect and Edit Privacy Neurons in pretrained language models.
arXiv Detail & Related papers (2023-10-31T03:09:36Z) - Journey to the Center of the Knowledge Neurons: Discoveries of
Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons [20.56154830853632]
This paper delves into the complex task of understanding how factual knowledge is stored in multilingual language models.
We introduce the Architecture-adapted Multilingual Integrated Gradients method, which successfully localizes knowledge neurons more precisely.
We also conduct an in-depth exploration of knowledge neurons, leading to the following two important discoveries.
arXiv Detail & Related papers (2023-08-25T06:26:05Z) - ShadowNet for Data-Centric Quantum System Learning [188.683909185536]
We propose a data-centric learning paradigm combining the strength of neural-network protocols and classical shadows.
Capitalizing on the generalization power of neural networks, this paradigm can be trained offline and excel at predicting previously unseen systems.
We present the instantiation of our paradigm in quantum state tomography and direct fidelity estimation tasks and conduct numerical analysis up to 60 qubits.
arXiv Detail & Related papers (2023-08-22T09:11:53Z) - Quantifying Association Capabilities of Large Language Models and Its
Implications on Privacy Leakage [28.385083741414213]
This paper delves into the association capabilities of language models, aiming to uncover the factors that influence their proficiency in associating information.
Our study reveals that as models scale up, their capacity to associate entities/information intensifies, particularly when target pairs demonstrate shorter co-occurrence distances or higher co-occurrence frequencies.
Despite the proportion of accurately predicted PII being relatively small, LLMs still demonstrate the capability to predict specific instances of email addresses and phone numbers when provided with appropriate prompts.
arXiv Detail & Related papers (2023-05-22T04:30:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.