DUSK: Do Not Unlearn Shared Knowledge
- URL: http://arxiv.org/abs/2505.15209v3
- Date: Sat, 31 May 2025 04:26:58 GMT
- Title: DUSK: Do Not Unlearn Shared Knowledge
- Authors: Wonje Jeung, Sangyeon Yoon, Hyesoo Hong, Soeun Kim, Seungju Han, Youngjae Yu, Albert No,
- Abstract summary: Machine unlearning aims to remove such 'forget' data while preserving utility and information from the'retain' set.<n>We introduce DUSK, a benchmark designed to evaluate unlearning methods under realistic data overlap.
- Score: 19.614306360050016
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly deployed in real-world applications, raising concerns about the unauthorized use of copyrighted or sensitive data. Machine unlearning aims to remove such 'forget' data while preserving utility and information from the 'retain' set. However, existing evaluations typically assume that forget and retain sets are fully disjoint, overlooking realistic scenarios where they share overlapping content. For instance, a news article may need to be unlearned, even though the same event, such as an earthquake in Japan, is also described factually on Wikipedia. Effective unlearning should remove the specific phrasing of the news article while preserving publicly supported facts. In this paper, we introduce DUSK, a benchmark designed to evaluate unlearning methods under realistic data overlap. DUSK constructs document sets that describe the same factual content in different styles, with some shared information appearing across all sets and other content remaining unique to each. When one set is designated for unlearning, an ideal method should remove its unique content while preserving shared facts. We define seven evaluation metrics to assess whether unlearning methods can achieve this selective removal. Our evaluation of nine recent unlearning methods reveals a key limitation: while most can remove surface-level text, they often fail to erase deeper, context-specific knowledge without damaging shared content. We release DUSK as a public benchmark to support the development of more precise and reliable unlearning techniques for real-world applications.
Related papers
- Align-then-Unlearn: Embedding Alignment for LLM Unlearning [41.94295877935867]
Unlearning seeks to selectively remove specific data from trained models, such as personal information or copyrighted content.<n>We propose Align-then-Unlearn, a novel framework that performs unlearning in the semantic embedding space.
arXiv Detail & Related papers (2025-06-16T07:48:01Z) - Not Every Token Needs Forgetting: Selective Unlearning to Limit Change in Utility in Large Language Model Unlearning [95.53571199301963]
Conventional unlearning approaches indiscriminately update model parameters to forget all tokens in a target document.<n>We propose Selective Unlearning (SU), which identifies a critical subset of tokens within the forgetting set that is relevant to the unwanted information.<n>Experiments on two benchmarks and six baseline unlearning algorithms demonstrate that SU not only achieves effective unlearning on the targeted forget data, but also significantly preserves the model's utility in the retaining set.
arXiv Detail & Related papers (2025-06-01T07:36:45Z) - WaterDrum: Watermarking for Data-centric Unlearning Metric [47.36231091296615]
Large language model (LLM) unlearning is critical in real-world applications where it is necessary to efficiently remove the influence of private, copyrighted, or harmful data from some users.<n>This paper presents the first data-centric unlearning metric for LLMs called WaterDrum that exploits robust text watermarking for overcoming limitations.<n>We also introduce new benchmark datasets for LLM unlearning that contain varying levels of similar data points and can be used to rigorously evaluate unlearning algorithms using WaterDrum.
arXiv Detail & Related papers (2025-05-08T08:56:46Z) - Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models [70.78205685001168]
We investigate knowledge forgetting in large language models with a focus on its generalisation.<n> UGBench is the first benchmark specifically designed to assess the unlearning of in-scope implicit knowledge.<n>We propose PerMU, a novel probability-based unlearning paradigm.
arXiv Detail & Related papers (2025-02-27T11:03:33Z) - MUSE: Machine Unlearning Six-Way Evaluation for Language Models [109.76505405962783]
Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content.
We propose MUSE, a comprehensive machine unlearning evaluation benchmark.
We benchmark how effectively eight popular unlearning algorithms can unlearn Harry Potter books and news articles.
arXiv Detail & Related papers (2024-07-08T23:47:29Z) - To Forget or Not? Towards Practical Knowledge Unlearning for Large Language Models [39.39428450239399]
Large Language Models (LLMs) trained on extensive corpora inevitably retain sensitive data, such as personal privacy information and copyrighted material.
Recent advancements in knowledge unlearning involve updating LLM parameters to erase specific knowledge.
We introduce KnowUnDo to evaluate if the unlearning process inadvertently erases essential knowledge.
arXiv Detail & Related papers (2024-07-02T03:34:16Z) - RWKU: Benchmarking Real-World Knowledge Unlearning for Large Language Models [20.944353802665965]
Large language models (LLMs) inevitably memorize sensitive, copyrighted, and harmful knowledge from the training corpus.
We propose a Real-World Knowledge Unlearning benchmark (RWKU) for LLM unlearning.
arXiv Detail & Related papers (2024-06-16T10:47:21Z) - TOFU: A Task of Fictitious Unlearning for LLMs [99.92305790945507]
Large language models trained on massive corpora of data from the web can reproduce sensitive or private data raising both legal and ethical concerns.
Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training.
We present TOFU, a benchmark aimed at helping deepen our understanding of unlearning.
arXiv Detail & Related papers (2024-01-11T18:57:12Z) - Learning to Unlearn: Instance-wise Unlearning for Pre-trained
Classifiers [71.70205894168039]
We consider instance-wise unlearning, of which the goal is to delete information on a set of instances from a pre-trained model.
We propose two methods that reduce forgetting on the remaining data: 1) utilizing adversarial examples to overcome forgetting at the representation-level and 2) leveraging weight importance metrics to pinpoint network parameters guilty of propagating unwanted information.
arXiv Detail & Related papers (2023-01-27T07:53:50Z) - Abstractive Summarization of Spoken and Written Instructions with BERT [66.14755043607776]
We present the first application of the BERTSum model to conversational language.
We generate abstractive summaries of narrated instructional videos across a wide variety of topics.
We envision this integrated as a feature in intelligent virtual assistants, enabling them to summarize both written and spoken instructional content upon request.
arXiv Detail & Related papers (2020-08-21T20:59:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.