Understanding and Mitigating Cross-lingual Privacy Leakage via Language-specific and Universal Privacy Neurons
- URL: http://arxiv.org/abs/2506.00759v2
- Date: Sun, 08 Jun 2025 14:59:14 GMT
- Title: Understanding and Mitigating Cross-lingual Privacy Leakage via Language-specific and Universal Privacy Neurons
- Authors: Wenshuo Dong, Qingsong Yang, Shu Yang, Lijie Hu, Meng Ding, Wanyu Lin, Tianhang Zheng, Di Wang,
- Abstract summary: This work investigates the information flow of cross-lingual privacy leakage.<n>We identify privacy-universal neurons and language-specific privacy neurons.<n>By deactivating these neurons, the cross-lingual privacy leakage risk is reduced by 23.3%-31.6%.
- Score: 17.557961521354766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large Language Models (LLMs) trained on massive data capture rich information embedded in the training data. However, this also introduces the risk of privacy leakage, particularly involving personally identifiable information (PII). Although previous studies have shown that this risk can be mitigated through methods such as privacy neurons, they all assume that both the (sensitive) training data and user queries are in English. We show that they cannot defend against the privacy leakage in cross-lingual contexts: even if the training data is exclusively in one language, these (private) models may still reveal private information when queried in another language. In this work, we first investigate the information flow of cross-lingual privacy leakage to give a better understanding. We find that LLMs process private information in the middle layers, where representations are largely shared across languages. The risk of leakage peaks when converted to a language-specific space in later layers. Based on this, we identify privacy-universal neurons and language-specific privacy neurons. Privacy-universal neurons influence privacy leakage across all languages, while language-specific privacy neurons are only related to specific languages. By deactivating these neurons, the cross-lingual privacy leakage risk is reduced by 23.3%-31.6%.
Related papers
- Current State in Privacy-Preserving Text Preprocessing for Domain-Agnostic NLP [0.0]
Modern large language models require a huge amount of data to learn linguistic variations.<n>It is possible to extract private information from such language models.<n>This report focuses on a few approaches for domain-agnostic NLP tasks.
arXiv Detail & Related papers (2025-08-05T08:26:45Z) - PrivacyLens: Evaluating Privacy Norm Awareness of Language Models in Action [54.11479432110771]
PrivacyLens is a novel framework designed to extend privacy-sensitive seeds into expressive vignettes and further into agent trajectories.<n>We instantiate PrivacyLens with a collection of privacy norms grounded in privacy literature and crowdsourced seeds.<n>State-of-the-art LMs, like GPT-4 and Llama-3-70B, leak sensitive information in 25.68% and 38.69% of cases, even when prompted with privacy-enhancing instructions.
arXiv Detail & Related papers (2024-08-29T17:58:38Z) - Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models [62.91524967852552]
Large language models (LLMs) are typically multilingual due to pretraining on diverse multilingual corpora.<n>But can these models relate corresponding concepts across languages, i.e., be crosslingual?<n>This study evaluates state-of-the-art LLMs on inherently crosslingual tasks.
arXiv Detail & Related papers (2024-06-23T15:15:17Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering [2.2194815687410627]
We show how a malicious client can leak the privacy-sensitive data of some other users in FL even without any cooperation from the server.<n>Our best-performing method improves the membership inference recall by 29% and achieves up to 71% private data reconstruction.
arXiv Detail & Related papers (2023-10-24T19:50:01Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - How Do Input Attributes Impact the Privacy Loss in Differential Privacy? [55.492422758737575]
We study the connection between the per-subject norm in DP neural networks and individual privacy loss.
We introduce a novel metric termed the Privacy Loss-Input Susceptibility (PLIS) which allows one to apportion the subject's privacy loss to their input attributes.
arXiv Detail & Related papers (2022-11-18T11:39:03Z) - What Does it Mean for a Language Model to Preserve Privacy? [12.955456268790005]
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life.
We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models.
We conclude that language models should be trained on text data which was explicitly produced for public use.
arXiv Detail & Related papers (2022-02-11T09:18:27Z) - Selective Differential Privacy for Language Modeling [36.64464956102432]
Previous work has attempted to tackle this challenge by training RNN-based language models with differential privacy guarantees.
We propose a new privacy notion, selective differential privacy, to provide rigorous privacy guarantees on the sensitive portion of the data.
Experiments on both language modeling and dialog system building show that the proposed privacy-preserving mechanism achieves better utilities.
arXiv Detail & Related papers (2021-08-30T01:11:10Z) - KART: Privacy Leakage Framework of Language Models Pre-trained with
Clinical Records [0.0]
We empirically evaluated the privacy risk of language models, using several BERT models pre-trained with MIMIC-III corpus.
BERT models were probably low-risk because the Top-100 accuracy of each attack was far below expected by chance.
We formalized various privacy leakage scenarios under a universal novel framework named Knowledge, Anonymization, Resource, and Target (KART) framework.
arXiv Detail & Related papers (2020-12-31T19:06:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.