Privacy Adhering Machine Un-learning in NLP
- URL: http://arxiv.org/abs/2212.09573v1
- Date: Mon, 19 Dec 2022 16:06:45 GMT
- Title: Privacy Adhering Machine Un-learning in NLP
- Authors: Vinayshekhar Bannihatti Kumar, Rashmi Gangadharaiah, Dan Roth
- Abstract summary: In real world industry use Machine Learning to build models on user data.
Such mandates require effort both in terms of data as well as model retraining.
continuous removal of data and model retraining steps do not scale.
We propose textitMachine Unlearning to tackle this challenge.
- Score: 66.17039929803933
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Regulations introduced by General Data Protection Regulation (GDPR) in the EU
or California Consumer Privacy Act (CCPA) in the US have included provisions on
the \textit{right to be forgotten} that mandates industry applications to
remove data related to an individual from their systems. In several real world
industry applications that use Machine Learning to build models on user data,
such mandates require significant effort both in terms of data cleansing as
well as model retraining while ensuring the models do not deteriorate in
prediction quality due to removal of data. As a result, continuous removal of
data and model retraining steps do not scale if these applications receive such
requests at a very high frequency. Recently, a few researchers proposed the
idea of \textit{Machine Unlearning} to tackle this challenge. Despite the
significant importance of this task, the area of Machine Unlearning is
under-explored in Natural Language Processing (NLP) tasks. In this paper, we
explore the Unlearning framework on various GLUE tasks \cite{Wang:18}, such as,
QQP, SST and MNLI. We propose computationally efficient approaches (SISA-FC and
SISA-A) to perform \textit{guaranteed} Unlearning that provides significant
reduction in terms of both memory (90-95\%), time (100x) and space consumption
(99\%) in comparison to the baselines while keeping model performance constant.
Related papers
- MUSE: Machine Unlearning Six-Way Evaluation for Language Models [109.76505405962783]
Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content.
We propose MUSE, a comprehensive machine unlearning evaluation benchmark.
We benchmark how effectively eight popular unlearning algorithms can unlearn Harry Potter books and news articles.
arXiv Detail & Related papers (2024-07-08T23:47:29Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Dataset Condensation Driven Machine Unlearning [0.0]
Current trend in data regulation requirements and privacy-preserving machine learning has emphasized the importance of machine unlearning.
We propose new dataset condensation techniques and an innovative unlearning scheme that strikes a balance between machine unlearning privacy, utility, and efficiency.
We present a novel and effective approach to instrumenting machine unlearning and propose its application in defending against membership inference and model inversion attacks.
arXiv Detail & Related papers (2024-01-31T21:48:25Z) - SecureCut: Federated Gradient Boosting Decision Trees with Efficient
Machine Unlearning [10.011146979811752]
It has become imperative to enable data removal in Vertical Federated Learning (VFL) where multiple parties provide private features for model training.
In VFL, data removal, i.e., textitmachine unlearning, often requires removing specific features across all samples under privacy guarentee.
We propose methname, a novel Gradient Boosting Decision Tree (GBDT) framework that effectively enables both textitinstance unlearning and textitfeature unlearning without the need for retraining from scratch.
arXiv Detail & Related papers (2023-11-22T05:38:53Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - Making Large Language Models Better Data Creators [22.0882632635255]
Large language models (LLMs) have advanced the state-of-the-art in NLP significantly.
deploying them for downstream applications is still challenging due to cost, responsiveness, control, or concerns around privacy and security.
We propose a unified data creation pipeline that requires only a single format example.
arXiv Detail & Related papers (2023-10-31T01:08:34Z) - Simultaneous Machine Translation with Large Language Models [51.470478122113356]
We investigate the possibility of applying Large Language Models to SimulMT tasks.
We conducted experiments using the textttLlama2-7b-chat model on nine different languages from the MUST-C dataset.
The results show that LLM outperforms dedicated MT models in terms of BLEU and LAAL metrics.
arXiv Detail & Related papers (2023-09-13T04:06:47Z) - Learn to Unlearn: A Survey on Machine Unlearning [29.077334665555316]
This article presents a review of recent machine unlearning techniques, verification mechanisms, and potential attacks.
We highlight emerging challenges and prospective research directions.
We aim for this paper to provide valuable resources for integrating privacy, equity, andresilience into ML systems.
arXiv Detail & Related papers (2023-05-12T14:28:02Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.