TOFU: A Task of Fictitious Unlearning for LLMs
- URL: http://arxiv.org/abs/2401.06121v1
- Date: Thu, 11 Jan 2024 18:57:12 GMT
- Title: TOFU: A Task of Fictitious Unlearning for LLMs
- Authors: Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J.
Zico Kolter
- Abstract summary: Large language models trained on massive corpora of data from the web can reproduce sensitive or private data raising both legal and ethical concerns.
Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training.
We present TOFU, a benchmark aimed at helping deepen our understanding of unlearning.
- Score: 99.92305790945507
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language models trained on massive corpora of data from the web can
memorize and reproduce sensitive or private data raising both legal and ethical
concerns. Unlearning, or tuning models to forget information present in their
training data, provides us with a way to protect private data after training.
Although several methods exist for such unlearning, it is unclear to what
extent they result in models equivalent to those where the data to be forgotten
was never learned in the first place. To address this challenge, we present
TOFU, a Task of Fictitious Unlearning, as a benchmark aimed at helping deepen
our understanding of unlearning. We offer a dataset of 200 diverse synthetic
author profiles, each consisting of 20 question-answer pairs, and a subset of
these profiles called the forget set that serves as the target for unlearning.
We compile a suite of metrics that work together to provide a holistic picture
of unlearning efficacy. Finally, we provide a set of baseline results from
existing unlearning algorithms. Importantly, none of the baselines we consider
show effective unlearning motivating continued efforts to develop approaches
for unlearning that effectively tune models so that they truly behave as if
they were never trained on the forget data at all.
Related papers
- RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints.
Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models.
We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept [5.345828824625758]
We propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs)
By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data.
arXiv Detail & Related papers (2024-10-08T10:26:22Z) - The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements.
LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information.
Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - Federated Unlearning with Knowledge Distillation [9.666514931140707]
Federated Learning (FL) is designed to protect the data privacy of each client during the training process.
With the recent legislation on right to be forgotten, it is crucially essential for the FL model to possess the ability to forget what it has learned from each client.
We propose a novel federated unlearning method to eliminate a client's contribution by subtracting the accumulated historical updates from the model.
arXiv Detail & Related papers (2022-01-24T03:56:20Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.