A Study of Privacy-preserving Language Modeling Approaches
- URL: http://arxiv.org/abs/2508.15421v1
- Date: Thu, 21 Aug 2025 10:22:40 GMT
- Title: A Study of Privacy-preserving Language Modeling Approaches
- Authors: Pritilata Saha, Abhirup Sinha,
- Abstract summary: Preserving privacy in language models has become a crucial area of research.<n>This study provides a comprehensive study of the privacy-preserving language modeling approaches.<n>The outcomes of this study contribute to the ongoing research on privacy-preserving language modeling.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent developments in language modeling have increased their use in various applications and domains. Language models, often trained on sensitive data, can memorize and disclose this information during privacy attacks, raising concerns about protecting individuals' privacy rights. Preserving privacy in language models has become a crucial area of research, as privacy is one of the fundamental human rights. Despite its significance, understanding of how much privacy risk these language models possess and how it can be mitigated is still limited. This research addresses this by providing a comprehensive study of the privacy-preserving language modeling approaches. This study gives an in-depth overview of these approaches, highlights their strengths, and investigates their limitations. The outcomes of this study contribute to the ongoing research on privacy-preserving language modeling, providing valuable insights and outlining future research directions.
Related papers
- Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models [55.23884055923282]
We introduce a comprehensive, multi-level Visual Privacy taxonomy.<n>We evaluate the capabilities of several state-of-the-art Vision-Language Models.
arXiv Detail & Related papers (2025-09-28T12:04:54Z) - Privacy-Preserving Large Language Models: Mechanisms, Applications, and Future Directions [0.0]
This survey explores the landscape of privacy-preserving mechanisms tailored for large language models.<n>We examine their efficacy in addressing key privacy challenges, such as membership inference and model inversion attacks.<n>By synthesizing state-of-the-art approaches and future trends, this paper provides a foundation for developing robust, privacy-preserving large language models.
arXiv Detail & Related papers (2024-12-09T00:24:09Z) - Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions [12.451936012379319]
Large Language Models (LLMs) represent a significant advancement in artificial intelligence, finding applications across various domains.<n>Their reliance on massive internet-sourced datasets for training brings notable privacy issues.<n>Certain application-specific scenarios may require fine-tuning these models on private data.
arXiv Detail & Related papers (2024-08-10T05:41:19Z) - A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures [50.987594546912725]
Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations.
This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures.
arXiv Detail & Related papers (2024-03-31T12:44:48Z) - Exploring the Privacy Protection Capabilities of Chinese Large Language Models [19.12726985060863]
We have devised a three-tiered progressive framework for evaluating privacy in language systems.
Our primary objective is to comprehensively evaluate the sensitivity of large language models to private information.
Our observations indicate that existing Chinese large language models universally show privacy protection shortcomings.
arXiv Detail & Related papers (2024-03-27T02:31:54Z) - Can LLMs Keep a Secret? Testing Privacy Implications of Language Models via Contextual Integrity Theory [82.7042006247124]
We show that even the most capable AI models reveal private information in contexts that humans would not, 39% and 57% of the time, respectively.
Our work underscores the immediate need to explore novel inference-time privacy-preserving approaches, based on reasoning and theory of mind.
arXiv Detail & Related papers (2023-10-27T04:15:30Z) - PrivacyMind: Large Language Models Can Be Contextual Privacy Protection Learners [81.571305826793]
We introduce Contextual Privacy Protection Language Models (PrivacyMind)
Our work offers a theoretical analysis for model design and benchmarks various techniques.
In particular, instruction tuning with both positive and negative examples stands out as a promising method.
arXiv Detail & Related papers (2023-10-03T22:37:01Z) - A Unified View of Differentially Private Deep Generative Modeling [60.72161965018005]
Data with privacy concerns comes with stringent regulations that frequently prohibited data access and data sharing.
Overcoming these obstacles is key for technological progress in many real-world application scenarios that involve privacy sensitive data.
Differentially private (DP) data publishing provides a compelling solution, where only a sanitized form of the data is publicly released.
arXiv Detail & Related papers (2023-09-27T14:38:16Z) - PLUE: Language Understanding Evaluation Benchmark for Privacy Policies
in English [77.79102359580702]
We introduce the Privacy Policy Language Understanding Evaluation benchmark, a multi-task benchmark for evaluating the privacy policy language understanding.
We also collect a large corpus of privacy policies to enable privacy policy domain-specific language model pre-training.
We demonstrate that domain-specific continual pre-training offers performance improvements across all tasks.
arXiv Detail & Related papers (2022-12-20T05:58:32Z) - What Does it Mean for a Language Model to Preserve Privacy? [12.955456268790005]
Natural language reflects our private lives and identities, making its privacy concerns as broad as those of real life.
We argue that existing protection methods cannot guarantee a generic and meaningful notion of privacy for language models.
We conclude that language models should be trained on text data which was explicitly produced for public use.
arXiv Detail & Related papers (2022-02-11T09:18:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.