Related papers: MUSE: Machine Unlearning Six-Way Evaluation for Language Models

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

URL: http://arxiv.org/abs/2407.06460v2
Date: Sun, 14 Jul 2024 20:14:02 GMT
Title: MUSE: Machine Unlearning Six-Way Evaluation for Language Models
Authors: Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang,
Abstract summary: Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. We propose MUSE, a comprehensive machine unlearning evaluation benchmark. We benchmark how effectively eight popular unlearning algorithms can unlearn Harry Potter books and news articles.
Score: 109.76505405962783
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approximate unlearning algorithms. The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3) no privacy leakage, (4) utility preservation on data not intended for removal, (5) scalability with respect to the size of removal requests, and (6) sustainability over sequential unlearning requests. Using these criteria, we benchmark how effectively eight popular unlearning algorithms on 7B-parameter LMs can unlearn Harry Potter books and news articles. Our results demonstrate that most algorithms can prevent verbatim memorization and knowledge memorization to varying degrees, but only one algorithm does not lead to severe privacy leakage. Furthermore, existing algorithms fail to meet deployer's expectations because they often degrade general model utility and also cannot sustainably accommodate successive unlearning requests or large-scale content removal. Our findings identify key issues with the practicality of existing unlearning algorithms on language models, and we release our benchmark to facilitate further evaluations: muse-bench.github.io

Related papers

Does Machine Unlearning Truly Remove Model Knowledge? A Framework for Auditing Unlearning in LLMs [58.24692529185971]
We introduce a comprehensive auditing framework for unlearning evaluation comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.<n>We evaluate the effectiveness and robustness of different unlearning strategies.
arXiv Detail & Related papers (2025-05-29T09:19:07Z)
LUME: LLM Unlearning with Multitask Evaluations [106.83812472773522]
Unlearning aims to remove copyrighted, sensitive, or private content from large language models (LLMs) without a full retraining. We develop a multi-task unlearning benchmark (LUME) which features three tasks: (1) unlearn synthetically generated creative short novels, (2) unlearn synthetic biographies with sensitive information, and (3) unlearn a collection of public biographies.
arXiv Detail & Related papers (2025-02-20T23:30:45Z)
Soft Token Attacks Cannot Reliably Audit Unlearning in Large Language Models [5.807314706494602]
We show that soft token attacks (STAs) can successfully extract purportedly unlearned information from large language models (LLMs) Our work highlights the need for better evaluation baselines, and more appropriate auditing tools for assessing the effectiveness of unlearning.
arXiv Detail & Related papers (2025-02-20T13:22:33Z)
FUNU: Boosting Machine Unlearning Efficiency by Filtering Unnecessary Unlearning [9.472692023087223]
We propose FUNU, a method to identify data points that lead to unnecessary unlearning. We provide a theoretical analysis of FUNU and conduct extensive experiments to validate its efficacy.
arXiv Detail & Related papers (2025-01-28T01:19:07Z)
RESTOR: Knowledge Recovery through Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can memorize undesirable datapoints. Many machine unlearning methods have been proposed that aim to 'erase' these datapoints from trained models. We propose the RESTOR framework for machine unlearning based on the following dimensions.
arXiv Detail & Related papers (2024-10-31T20:54:35Z)
Catastrophic Failure of LLM Unlearning via Quantization [36.524827594501495]
We show that applying quantization to models that have undergone unlearning can restore the "forgotten" information. We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21% of the intended forgotten knowledge in full precision.
arXiv Detail & Related papers (2024-10-21T19:28:37Z)
A Closer Look at Machine Unlearning for Large Language Models [46.245404272612795]
Large language models (LLMs) may memorize sensitive or copyrighted content, raising privacy and legal concerns. We discuss several issues in machine unlearning for LLMs and provide our insights on possible approaches.
arXiv Detail & Related papers (2024-10-10T16:56:05Z)
SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation. Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z)
Gone but Not Forgotten: Improved Benchmarks for Machine Unlearning [0.0]
We describe and propose alternative evaluation methods for machine unlearning algorithms. We show the utility of our alternative evaluations via a series of experiments of state-of-the-art unlearning algorithms on different computer vision datasets.
arXiv Detail & Related papers (2024-05-29T15:53:23Z)
Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models [1.443696537295348]
Privacy leakage and copyright violation are still underexplored. Our unlearning algorithms are not only data-agnostic/model-agnostic but also proven to be robust in terms of utility preservation or privacy guarantee.
arXiv Detail & Related papers (2024-03-13T18:57:30Z)
TOFU: A Task of Fictitious Unlearning for LLMs [99.92305790945507]
Large language models trained on massive corpora of data from the web can reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. We present TOFU, a benchmark aimed at helping deepen our understanding of unlearning.
arXiv Detail & Related papers (2024-01-11T18:57:12Z)
Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning. Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset. We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU) We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z)
Tight Bounds for Machine Unlearning via Differential Privacy [0.7252027234425334]
We consider the so-called "right to be forgotten" by requiring that a trained model should be able to "unlearn" a number of points from the training data. We obtain tight bounds on the deletion capacity achievable by DP-based machine unlearning algorithms.
arXiv Detail & Related papers (2023-09-02T09:55:29Z)
Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models. Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.