Privacy Issues in Large Language Models: A Survey
- URL: http://arxiv.org/abs/2312.06717v4
- Date: Thu, 30 May 2024 19:26:05 GMT
- Title: Privacy Issues in Large Language Models: A Survey
- Authors: Seth Neel, Peter Chang,
- Abstract summary: This is the first survey of the active area of AI research that focuses on privacy issues in Large Language Models (LLMs)
We focus on work that red-teams models to highlight privacy risks, attempts to build privacy into the training or inference process, and tries to mitigate copyright issues.
- Score: 2.707979363409351
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This is the first survey of the active area of AI research that focuses on privacy issues in Large Language Models (LLMs). Specifically, we focus on work that red-teams models to highlight privacy risks, attempts to build privacy into the training or inference process, enables efficient data deletion from trained models to comply with existing privacy regulations, and tries to mitigate copyright issues. Our focus is on summarizing technical research that develops algorithms, proves theorems, and runs empirical evaluations. While there is an extensive body of legal and policy work addressing these challenges from a different angle, that is not the focus of our survey. Nevertheless, these works, along with recent legal developments do inform how these technical problems are formalized, and so we discuss them briefly in Section 1. While we have made our best effort to include all the relevant work, due to the fast moving nature of this research we may have missed some recent work. If we have missed some of your work please contact us, as we will attempt to keep this survey relatively up to date. We are maintaining a repository with the list of papers covered in this survey and any relevant code that was publicly available at https://github.com/safr-ml-lab/survey-llm.
Related papers
- Experimenting with Legal AI Solutions: The Case of Question-Answering for Access to Justice [32.550204238857724]
We propose a human-centric legal NLP pipeline, covering data sourcing, inference, and evaluation.
We release a dataset, LegalQA, with real and specific legal questions spanning from employment law to criminal law.
We show that retrieval-augmented generation from only 850 citations in the train set can match or outperform internet-wide retrieval.
arXiv Detail & Related papers (2024-09-12T02:40:28Z) - Privacy Risks of General-Purpose AI Systems: A Foundation for Investigating Practitioner Perspectives [47.17703009473386]
Powerful AI models have led to impressive leaps in performance across a wide range of tasks.
Privacy concerns have led to a wealth of literature covering various privacy risks and vulnerabilities of AI models.
We conduct a systematic review of these survey papers to provide a concise and usable overview of privacy risks in GPAIS.
arXiv Detail & Related papers (2024-07-02T07:49:48Z) - How the Future Works at SOUPS: Analyzing Future Work Statements and Their Impact on Usable Security and Privacy Research [9.307988641609834]
We reviewed all 27 papers from the 2019 SOUPS proceedings and analyzed their future work statements.
We find that most papers from the SOUPS 2019 proceedings include future work statements. However, they are often unspecific or ambiguous, and not always easy to find.
We conclude with recommendations for the usable security and privacy community to improve the utility of future work statements.
arXiv Detail & Related papers (2024-05-30T07:07:18Z) - A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures [50.987594546912725]
Despite a growing corpus of research in AI privacy and explainability, there is little attention on privacy-preserving model explanations.
This article presents the first thorough survey about privacy attacks on model explanations and their countermeasures.
arXiv Detail & Related papers (2024-03-31T12:44:48Z) - A Comprehensive Survey of Forgetting in Deep Learning Beyond Continual
Learning [76.47138162283714]
Forgetting refers to the loss or deterioration of previously acquired information or knowledge.
Forgetting is a prevalent phenomenon observed in various other research domains within deep learning.
Survey argues that forgetting is a double-edged sword and can be beneficial and desirable in certain cases.
arXiv Detail & Related papers (2023-07-16T16:27:58Z) - Privacy Meets Explainability: A Comprehensive Impact Benchmark [4.526582372434088]
This work is the first to investigate the impact of private learning techniques on generated explanations for Deep Learning-based models.
The findings suggest non-negligible changes in explanations through the introduction of privacy.
arXiv Detail & Related papers (2022-11-08T09:20:28Z) - Why Should Adversarial Perturbations be Imperceptible? Rethink the
Research Paradigm in Adversarial NLP [83.66405397421907]
We rethink the research paradigm of textual adversarial samples in security scenarios.
We first collect, process, and release a security datasets collection Advbench.
Next, we propose a simple method based on rules that can easily fulfill the actual adversarial goals to simulate real-world attack methods.
arXiv Detail & Related papers (2022-10-19T15:53:36Z) - Yes-Yes-Yes: Donation-based Peer Reviewing Data Collection for ACL
Rolling Review and Beyond [58.71736531356398]
We present an in-depth discussion of peer reviewing data, outline the ethical and legal desiderata for peer reviewing data collection, and propose the first continuous, donation-based data collection workflow.
We report on the ongoing implementation of this workflow at the ACL Rolling Review and deliver the first insights obtained with the newly collected data.
arXiv Detail & Related papers (2022-01-27T11:02:43Z) - Privacy in Open Search: A Review of Challenges and Solutions [0.6445605125467572]
Information retrieval (IR) is prone to privacy threats, such as attacks and unintended disclosures of documents and search history.
This work aims at highlighting and discussing open challenges for privacy in the recent literature of IR, focusing on tasks featuring user-generated text data.
arXiv Detail & Related papers (2021-10-20T18:38:48Z) - PolicyQA: A Reading Comprehension Dataset for Privacy Policies [77.79102359580702]
We present PolicyQA, a dataset that contains 25,017 reading comprehension style examples curated from an existing corpus of 115 website privacy policies.
We evaluate two existing neural QA models and perform rigorous analysis to reveal the advantages and challenges offered by PolicyQA.
arXiv Detail & Related papers (2020-10-06T09:04:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.