Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions
- URL: http://arxiv.org/abs/2307.03941v4
- Date: Wed, 5 Jun 2024 01:14:19 GMT
- Title: Right to be Forgotten in the Era of Large Language Models: Implications, Challenges, and Solutions
- Authors: Dawen Zhang, Pamela Finckenberg-Broman, Thong Hoang, Shidong Pan, Zhenchang Xing, Mark Staples, Xiwei Xu,
- Abstract summary: Right to be Forgotten (RTBF) was first established as the result of evolution of technology.
Recent development of Large Language Models (LLMs) pose new challenges for compliance with RTBF.
We show how to implement technical solutions for the RTBF, including use of differential privacy, machine unlearning, model editing, and guardrails.
- Score: 15.726163080180653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The Right to be Forgotten (RTBF) was first established as the result of the ruling of Google Spain SL, Google Inc. v AEPD, Mario Costeja Gonz\'alez, and was later included as the Right to Erasure under the General Data Protection Regulation (GDPR) of European Union to allow individuals the right to request personal data be deleted by organizations. Specifically for search engines, individuals can send requests to organizations to exclude their information from the query results. It was a significant emergent right as the result of the evolution of technology. With the recent development of Large Language Models (LLMs) and their use in chatbots, LLM-enabled software systems have become popular. But they are not excluded from the RTBF. Compared with the indexing approach used by search engines, LLMs store, and process information in a completely different way. This poses new challenges for compliance with the RTBF. In this paper, we explore these challenges and provide our insights on how to implement technical solutions for the RTBF, including the use of differential privacy, machine unlearning, model editing, and guardrails. With the rapid advancement of AI and the increasing need of regulating this powerful technology, learning from the case of RTBF can provide valuable lessons for technical practitioners, legal experts, organizations, and authorities.
Related papers
- Free to play: UN Trade and Development's experience with developing its own open-source Retrieval Augmented Generation Large Language Model application [0.0]
UNCTAD has explored and developed its own open-source Retrieval Augmented Generation (RAG) LLM application.
RAG makes Large Language Models aware of and more useful for the organization's domain and work.
Three libraries developed to produce the app, nlp_pipeline for document processing and statistical analysis, local_rag_llm for running a local RAG LLM, and streamlit_rag for the user interface, are publicly available on PyPI and GitHub with Dockerfiles.
arXiv Detail & Related papers (2024-06-18T14:23:54Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models [1.443696537295348]
Privacy leakage and copyright violation are still underexplored.
Our unlearning algorithms are not only data-agnostic/model-agnostic but also proven to be robust in terms of utility preservation or privacy guarantee.
arXiv Detail & Related papers (2024-03-13T18:57:30Z) - The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented
Generation (RAG) [56.67603627046346]
Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data.
In this work, we conduct empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database.
arXiv Detail & Related papers (2024-02-23T18:35:15Z) - Retrieval Augmented Thought Process for Private Data Handling in Healthcare [53.89406286212502]
We introduce the Retrieval-Augmented Thought Process (RATP)
RATP formulates the thought generation of Large Language Models (LLMs)
On a private dataset of electronic medical records, RATP achieves 35% additional accuracy compared to in-context retrieval-augmented generation for the question-answering task.
arXiv Detail & Related papers (2024-02-12T17:17:50Z) - Federated Fine-Tuning of LLMs on the Very Edge: The Good, the Bad, the Ugly [62.473245910234304]
This paper takes a hardware-centric approach to explore how Large Language Models can be brought to modern edge computing systems.
We provide a micro-level hardware benchmark, compare the model FLOP utilization to a state-of-the-art data center GPU, and study the network utilization in realistic conditions.
arXiv Detail & Related papers (2023-10-04T20:27:20Z) - Audit to Forget: A Unified Method to Revoke Patients' Private Data in
Intelligent Healthcare [14.22413100609926]
We developed AFS, which is able to evaluate and revoke patients' private data from pre-trained deep learning models.
We demonstrated the generality of AFS by applying it to four tasks on different datasets.
arXiv Detail & Related papers (2023-02-20T07:29:22Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z) - Mitigating Sovereign Data Exchange Challenges: A Mapping to Apply
Privacy- and Authenticity-Enhancing Technologies [67.34625604583208]
Authenticity Enhancing Technologies (AETs) and Privacy-Enhancing Technologies (PETs) are considered to engage in Sovereign Data Exchange (SDE)
PETs and AETs are technically complex, which impedes their adoption.
This study empirically constructs a challenge-oriented technology mapping.
arXiv Detail & Related papers (2022-06-20T08:16:42Z) - Datensouver\"anit\"at f\"ur Verbraucher:innen: Technische Ans\"atze
durch KI-basierte Transparenz und Auskunft im Kontext der DSGVO [0.0]
The EU General Data Protection Regulation guarantees comprehensive data subject rights.
Traditional approaches, such as the provision of lengthy data protection declarations, do not meet the requirements of informational self-determination.
For this purpose, the relevant transparency information is extracted in a semi-automated way, represented in a machine-readable format, and then played out via diverse channels such as virtual assistants.
arXiv Detail & Related papers (2021-12-07T18:18:19Z) - Bias in Data-driven AI Systems -- An Introductory Survey [37.34717604783343]
This survey focuses on data-driven AI, as a large part of AI is powered nowadays by (big) data and powerful Machine Learning (ML) algorithms.
If otherwise not specified, we use the general term bias to describe problems related to the gathering or processing of data that might result in prejudiced decisions on the bases of demographic features like race, sex, etc.
arXiv Detail & Related papers (2020-01-14T09:39:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.