Related papers: TOFU: A Task of Fictitious Unlearning for LLMs

TOFU: A Task of Fictitious Unlearning for LLMs

URL: http://arxiv.org/abs/2401.06121v1
Date: Thu, 11 Jan 2024 18:57:12 GMT
Title: TOFU: A Task of Fictitious Unlearning for LLMs
Authors: Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J. Zico Kolter
Abstract summary: Large language models trained on massive corpora of data from the web can reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. We present TOFU, a benchmark aimed at helping deepen our understanding of unlearning.
Score: 99.92305790945507
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. Although several methods exist for such unlearning, it is unclear to what extent they result in models equivalent to those where the data to be forgotten was never learned in the first place. To address this challenge, we present TOFU, a Task of Fictitious Unlearning, as a benchmark aimed at helping deepen our understanding of unlearning. We offer a dataset of 200 diverse synthetic author profiles, each consisting of 20 question-answer pairs, and a subset of these profiles called the forget set that serves as the target for unlearning. We compile a suite of metrics that work together to provide a holistic picture of unlearning efficacy. Finally, we provide a set of baseline results from existing unlearning algorithms. Importantly, none of the baselines we consider show effective unlearning motivating continued efforts to develop approaches for unlearning that effectively tune models so that they truly behave as if they were never trained on the forget data at all.

Related papers

Not All Data Are Unlearned Equally [30.936702475759688]
We study how the success of unlearning depends on the frequency of the knowledge we want to unlearn in the pre-training data of a model. We uncover a misalignment between probability and generation-based evaluations of unlearning and show that this problem worsens as models become larger.
arXiv Detail & Related papers (2025-04-07T13:29:02Z)
RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints. Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models. We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z)
CodeUnlearn: Amortized Zero-Shot Machine Unlearning in Language Models Using Discrete Concept [5.345828824625758]
We propose a novel amortized unlearning approach using codebook features and Sparse Autoencoders (SAEs) By leveraging a bottleneck to decompose the activation space and regulate information flow, our method efficiently unlearns targeted information while preserving the model's performance on unrelated data.
arXiv Detail & Related papers (2024-10-08T10:26:22Z)
Unlearning Personal Data from a Single Image [38.36863497458095]
Machine unlearning aims to erase data from a model as if the latter never saw them during training. Currently, no setting or benchmark exists to probe the effectiveness of unlearning methods in such scenarios. We propose One-Shot Unlearning of Personal Identities (1-SHUI) that evaluates unlearning models when the training data is not available.
arXiv Detail & Related papers (2024-07-16T10:00:54Z)
MUSE: Machine Unlearning Six-Way Evaluation for Language Models [109.76505405962783]
Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. We propose MUSE, a comprehensive machine unlearning evaluation benchmark. We benchmark how effectively eight popular unlearning algorithms can unlearn Harry Potter books and news articles.
arXiv Detail & Related papers (2024-07-08T23:47:29Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data. This process might suffer from privacy issues and violations of data protection regulations. We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z)
Class-wise Federated Unlearning: Harnessing Active Forgetting with Teacher-Student Memory Generation [11.638683787598817]
We propose a neuro-inspired federated unlearning framework based on active forgetting. Our framework distinguishes itself from existing methods by utilizing new memories to overwrite old ones. Our method achieves satisfactory unlearning completeness against backdoor attacks.
arXiv Detail & Related papers (2023-07-07T03:07:26Z)
Federated Unlearning with Knowledge Distillation [9.666514931140707]
Federated Learning (FL) is designed to protect the data privacy of each client during the training process. With the recent legislation on right to be forgotten, it is crucially essential for the FL model to possess the ability to forget what it has learned from each client. We propose a novel federated unlearning method to eliminate a client's contribution by subtracting the accumulated historical updates from the model.
arXiv Detail & Related papers (2022-01-24T03:56:20Z)
Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.