Privacy Adhering Machine Un-learning in NLP
- URL: http://arxiv.org/abs/2212.09573v1
- Date: Mon, 19 Dec 2022 16:06:45 GMT
- Title: Privacy Adhering Machine Un-learning in NLP
- Authors: Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth
- Abstract summary: In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
- Score: 66.17039929803933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Regulations introduced by General Data Protection Regulation (GDPR) in the EU
or California Consumer Privacy Act (CCPA) in the US have included provisions on
the \textit{right to be forgotten} that mandates industry applications to
remove data related to an individual from their systems. In several real world
industry applications that use Machine Learning to build models on user data,
such mandates require significant effort both in terms of data cleansing as
well as model retraining while ensuring the models do not deteriorate in
prediction quality due to removal of data. As a result, continuous removal of
data and model retraining steps do not scale if these applications receive such
requests at a very high frequency. Recently, a few researchers proposed the
idea of \textit{Machine Unlearning} to tackle this challenge. Despite the
significant importance of this task, the area of Machine Unlearning is
under-explored in Natural Language Processing (NLP) tasks. In this paper, we
explore the Unlearning framework on various GLUE tasks \cite{Wang:18}, such as,
QQP, SST and MNLI. We propose computationally efficient approaches (SISA-FC and
SISA-A) to perform \textit{guaranteed} Unlearning that provides significant
reduction in terms of both memory (90-95\%), time (100x) and space consumption
(99\%) in comparison to the baselines while keeping model performance constant.
Related papers
- Learning with Less: Knowledge Distillation from Large Language Models via Unlabeled Data [54.934578742209716]
In real-world NLP applications, Large Language Models (LLMs) offer promising solutions due to their extensive training on vast datasets.
LLKD is an adaptive sample selection method that incorporates signals from both the teacher and student.
Our comprehensive experiments show that LLKD achieves superior performance across various datasets with higher data efficiency.
arXiv Detail & Related papers (2024-11-12T18:57:59Z) - CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept [5.345828824625758]
We propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs)
By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data.
arXiv Detail & Related papers (2024-10-08T10:26:22Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - Making Large Language Models Better Data Creators [22.0882632635255]
Large language models (LLMs) have advanced the state-of-the-art in NLP significantly.
deploying them for downstream applications is still challenging due to cost, responsiveness, control, or concerns around privacy and security.
We propose a unified data creation pipeline that requires only a single format example.
arXiv Detail & Related papers (2023-10-31T01:08:34Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Learn to Unlearn: A Survey on Machine Unlearning [29.077334665555316]
This article presents a review of recent machine unlearning techniques, verification mechanisms, and potential attacks.
We highlight emerging challenges and prospective research directions.
We aim for this paper to provide valuable resources for integrating privacy, equity, andresilience into ML systems.
arXiv Detail & Related papers (2023-05-12T14:28:02Z) - A Survey of Machine Unlearning [56.017968863854186]
Recent regulations now require that, on request, private information about a user must be removed from computer systems.
ML models often remember' the old data.
Recent works on machine unlearning have not been able to completely solve the problem.
arXiv Detail & Related papers (2022-09-06T08:51:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.