Learnable Privacy Neurons Localization in Language Models
- URL: http://arxiv.org/abs/2405.10989v1
- Date: Thu, 16 May 2024 08:11:08 GMT
- Title: Learnable Privacy Neurons Localization in Language Models
- Authors: Ruizhe Chen, Tianxiang Hu, Yang Feng, Zuozhu Liu,
- Abstract summary: We introduce a pioneering method for pinpointing PII-sensitive neurons (privacy neurons) within Large Language Models (LLMs)
Our method employs learnable binary weight masks to localize specific neurons that account for the memorization of PII in LLMs through adversarial training.
We propose to validate the potential in PII risk mitigation by deactivating the localized privacy neurons.
- Score: 19.984172475680182
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Concerns regarding Large Language Models (LLMs) to memorize and disclose private information, particularly Personally Identifiable Information (PII), become prominent within the community. Many efforts have been made to mitigate the privacy risks. However, the mechanism through which LLMs memorize PII remains poorly understood. To bridge this gap, we introduce a pioneering method for pinpointing PII-sensitive neurons (privacy neurons) within LLMs. Our method employs learnable binary weight masks to localize specific neurons that account for the memorization of PII in LLMs through adversarial training. Our investigations discover that PII is memorized by a small subset of neurons across all layers, which shows the property of PII specificity. Furthermore, we propose to validate the potential in PII risk mitigation by deactivating the localized privacy neurons. Both quantitative and qualitative experiments demonstrate the effectiveness of our neuron localization algorithm.
Related papers
- How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective [64.79894853375478]
We propose a new finer-grained neuron identification algorithm, which detects language neurons(including language-specific neurons and language-related neurons) and language-agnostic neurons.<n>Based on the distributional characteristics of different types of neurons, we divide the LLMs' internal process for multilingual inference into four parts.<n>We systematically analyze the models before and after alignment with a focus on different types of neurons.
arXiv Detail & Related papers (2025-05-27T17:59:52Z) - Unveiling Knowledge Utilization Mechanisms in LLM-based Retrieval-Augmented Generation [77.10390725623125]
retrieval-augmented generation (RAG) is widely employed to expand their knowledge scope.<n>Since RAG has shown promise in knowledge-intensive tasks like open-domain question answering, its broader application to complex tasks and intelligent assistants has further advanced its utility.<n>We present a systematic investigation of the intrinsic mechanisms by which RAGs integrate internal (parametric) and external (retrieved) knowledge.
arXiv Detail & Related papers (2025-05-17T13:13:13Z) - PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders [8.483679748399037]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing but pose privacy risks by memorizing and leaking Personally Identifiable Information (PII)
Existing mitigation strategies, such as differential privacy and neuron-level interventions, often degrade model utility or fail to effectively prevent leakage.
We introduce PrivacyScalpel, a novel privacy-preserving framework that leverages interpretability techniques to identify and mitigate PII leakage while maintaining performance.
arXiv Detail & Related papers (2025-03-14T09:31:01Z) - Modality-Aware Neuron Pruning for Unlearning in Multimodal Large Language Models [24.30356626130181]
Generative models such as Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) trained on massive datasets can lead them to memorize and inadvertently reveal sensitive information, raising ethical and privacy concerns.
We propose Modality Aware Neuron Unlearning (MANU), a novel unlearning framework for MLLMs designed to selectively clip neurons based on their relative importance to the targeted forget data, curated for different modalities.
arXiv Detail & Related papers (2025-02-21T19:54:46Z) - One Mind, Many Tongues: A Deep Dive into Language-Agnostic Knowledge Neurons in Large Language Models [19.58983929459173]
Large language models (LLMs) have learned vast amounts of factual knowledge through self-supervised pre-training on large-scale corpora.
LLMs have also demonstrated excellent multilingual capabilities, which can express the learned knowledge in multiple languages.
arXiv Detail & Related papers (2024-11-26T13:03:49Z) - Identifying Query-Relevant Neurons in Large Language Models for Long-Form Texts [14.69046890281591]
We introduce a novel architecture-agnostic framework capable of identifying query-relevant neurons in large language models.
We show potential applications of our detected neurons in knowledge editing and neuron-based prediction.
arXiv Detail & Related papers (2024-06-16T09:36:32Z) - Sharing Matters: Analysing Neurons Across Languages and Tasks in LLMs [70.3132264719438]
We aim to fill the research gap by examining how neuron activation is shared across tasks and languages.
We classify neurons into four distinct categories based on their responses to a specific input across different languages.
Our analysis reveals the following insights: (i) the patterns of neuron sharing are significantly affected by the characteristics of tasks and examples; (ii) neuron sharing does not fully correspond with language similarity; (iii) shared neurons play a vital role in generating responses, especially those shared across all languages.
arXiv Detail & Related papers (2024-06-13T16:04:11Z) - Understanding Privacy Risks of Embeddings Induced by Large Language Models [75.96257812857554]
Large language models show early signs of artificial general intelligence but struggle with hallucinations.
One promising solution is to store external knowledge as embeddings, aiding LLMs in retrieval-augmented generation.
Recent studies experimentally showed that the original text can be partially reconstructed from text embeddings by pre-trained language models.
arXiv Detail & Related papers (2024-04-25T13:10:48Z) - Revealing the Parallel Multilingual Learning within Large Language Models [50.098518799536144]
In this study, we reveal an in-context learning capability of multilingual large language models (LLMs)
By translating the input to several languages, we provide Parallel Input in Multiple Languages (PiM) to LLMs, which significantly enhances their comprehension abilities.
arXiv Detail & Related papers (2024-03-14T03:33:46Z) - Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models [117.20416338476856]
Large language models (LLMs) demonstrate remarkable multilingual capabilities without being pre-trained on specially curated multilingual parallel corpora.
We propose a novel detection method, language activation probability entropy (LAPE), to identify language-specific neurons within LLMs.
Our findings indicate that LLMs' proficiency in processing a particular language is predominantly due to a small subset of neurons.
arXiv Detail & Related papers (2024-02-26T09:36:05Z) - DEPN: Detecting and Editing Privacy Neurons in Pretrained Language
Models [46.04803661300974]
Large language models pretrained on a huge amount of data capture rich knowledge and information in the training data.
The ability of data memorization and regurgitation in pretrained language models, revealed in previous studies, brings the risk of data leakage.
We propose a framework DEPN to Detect and Edit Privacy Neurons in pretrained language models.
arXiv Detail & Related papers (2023-10-31T03:09:36Z) - Journey to the Center of the Knowledge Neurons: Discoveries of
Language-Independent Knowledge Neurons and Degenerate Knowledge Neurons [20.56154830853632]
This paper delves into the complex task of understanding how factual knowledge is stored in multilingual language models.
We introduce the Architecture-adapted Multilingual Integrated Gradients method, which successfully localizes knowledge neurons more precisely.
We also conduct an in-depth exploration of knowledge neurons, leading to the following two important discoveries.
arXiv Detail & Related papers (2023-08-25T06:26:05Z) - ShadowNet for Data-Centric Quantum System Learning [188.683909185536]
We propose a data-centric learning paradigm combining the strength of neural-network protocols and classical shadows.
Capitalizing on the generalization power of neural networks, this paradigm can be trained offline and excel at predicting previously unseen systems.
We present the instantiation of our paradigm in quantum state tomography and direct fidelity estimation tasks and conduct numerical analysis up to 60 qubits.
arXiv Detail & Related papers (2023-08-22T09:11:53Z) - Quantifying Association Capabilities of Large Language Models and Its
Implications on Privacy Leakage [28.385083741414213]
This paper delves into the association capabilities of language models, aiming to uncover the factors that influence their proficiency in associating information.
Our study reveals that as models scale up, their capacity to associate entities/information intensifies, particularly when target pairs demonstrate shorter co-occurrence distances or higher co-occurrence frequencies.
Despite the proportion of accurately predicted PII being relatively small, LLMs still demonstrate the capability to predict specific instances of email addresses and phone numbers when provided with appropriate prompts.
arXiv Detail & Related papers (2023-05-22T04:30:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.