Related papers: OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models

URL: http://arxiv.org/abs/2505.04416v1
Date: Wed, 07 May 2025 13:51:42 GMT
Title: OBLIVIATE: Robust and Practical Machine Unlearning for Large Language Models
Authors: Xiaoyu Xu, Minxin Du, Qingqing Ye, Haibo Hu,
Abstract summary: Large language models (LLMs) trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content.<n>We propose OBLIVIATE, a robust unlearning framework that removes targeted data while preserving model utility.<n>We conduct experiments on multiple datasets, including the Harry Potter series, WMDP, and TOFU.
Score: 12.848214683467297
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) trained over extensive corpora risk memorizing sensitive, copyrighted, or toxic content. To address this, we propose OBLIVIATE, a robust unlearning framework that removes targeted data while preserving model utility. The framework follows a structured process: extracting target tokens, building retain sets, and fine-tuning with a tailored loss function comprising three components -- masking, distillation, and world fact. Using low-rank adapters (LoRA), it ensures efficiency without compromising unlearning quality. We conduct experiments on multiple datasets, including the Harry Potter series, WMDP, and TOFU, using a comprehensive suite of metrics: forget quality (new document-level memorization score), model utility, and fluency. Results demonstrate its effectiveness in resisting membership inference attacks, minimizing the impact on retained data, and maintaining robustness across diverse scenarios.

Related papers

Towards Mitigating Excessive Forgetting in LLM Unlearning via Entanglement-Aware Unlearning with Proxy Constraint [28.25159814956888]
Large language models (LLMs) are trained on massive datasets that may include private or copyrighted content.<n>Due to growing privacy and ownership concerns, data owners may request the removal of their data from trained models.<n>Most existing methods lack a sound forgetting boundary, causing some samples to be under-forgotten.
arXiv Detail & Related papers (2025-08-28T05:45:40Z)
ESLM: Risk-Averse Selective Language Modeling for Efficient Pretraining [53.893792844055106]
Large language model pretraining is compute-intensive, yet many tokens contribute marginally to learning, resulting in inefficiency.<n>We introduce Selective Efficient Language Modeling, a risk-aware algorithm that improves training efficiency and distributional robustness by performing online token-level batch selection.<n> Experiments on GPT-2 pretraining show that ESLM significantly reduces training FLOPs while maintaining or improving both perplexity and downstream performance compared to baselines.
arXiv Detail & Related papers (2025-05-26T12:23:26Z)
GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection [36.38245533018162]
Large Language Models (LLMs) have demonstrated strong capabilities in memorizing vast amounts of knowledge across diverse domains.<n>Existing unlearning efforts typically fine-tune the model with resources such as forget data, retain data, and a calibration model.<n>We propose Generation-time Unlearning via Adaptive Restriction and Detection (GUARD), a framework that enables dynamic unlearning during LLM generation.
arXiv Detail & Related papers (2025-05-19T16:26:58Z)
Tokens for Learning, Tokens for Unlearning: Mitigating Membership Inference Attacks in Large Language Models via Dual-Purpose Training [13.680205342714412]
Large language models (LLMs) have become the backbone of modern natural language processing but pose privacy concerns about leaking sensitive training data.<n>We propose a lightweight yet effective empirical privacy defense for protecting training data of language modeling by leveraging the token-specific characteristics.
arXiv Detail & Related papers (2025-02-27T03:37:45Z)
CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP [56.199779065855004]
We introduce CLIPErase, a novel approach that disentangles and selectively forgets both visual and textual associations. Experiments on the CIFAR-100 and Flickr30K datasets demonstrate that CLIPErase effectively forgets designated associations in zero-shot tasks for multimodal samples.
arXiv Detail & Related papers (2024-10-30T17:51:31Z)
Silver Linings in the Shadows: Harnessing Membership Inference for Machine Unlearning [7.557226714828334]
We present a novel unlearning mechanism designed to remove the impact of specific data samples from a neural network. In achieving this goal, we crafted a novel loss function tailored to eliminate privacy-sensitive information from weights and activation values of the target model. Our results showcase the superior performance of our approach in terms of unlearning efficacy and latency as well as the fidelity of the primary task.
arXiv Detail & Related papers (2024-07-01T00:20:26Z)
The Frontier of Data Erasure: Machine Unlearning for Large Language Models [56.26002631481726]
Large Language Models (LLMs) are foundational to AI advancements. LLMs pose risks by potentially memorizing and disseminating sensitive, biased, or copyrighted information. Machine unlearning emerges as a cutting-edge solution to mitigate these concerns.
arXiv Detail & Related papers (2024-03-23T09:26:15Z)
DUEL: Duplicate Elimination on Active Memory for Self-Supervised Class-Imbalanced Learning [19.717868805172323]
We propose an active data filtering process during self-supervised pre-training in our novel framework, Duplicate Elimination (DUEL) This framework integrates an active memory inspired by human working memory and introduces distinctiveness information, which measures the diversity of the data in the memory. The DUEL policy, which replaces the most duplicated data with new samples, aims to enhance the distinctiveness information in the memory and thereby mitigate class imbalances.
arXiv Detail & Related papers (2024-02-14T06:09:36Z)
TOFU: A Task of Fictitious Unlearning for LLMs [99.92305790945507]
Large language models trained on massive corpora of data from the web can reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. We present TOFU, a benchmark aimed at helping deepen our understanding of unlearning.
arXiv Detail & Related papers (2024-01-11T18:57:12Z)
Task-Distributionally Robust Data-Free Meta-Learning [99.56612787882334]
Data-Free Meta-Learning (DFML) aims to efficiently learn new tasks by leveraging multiple pre-trained models without requiring their original training data. For the first time, we reveal two major challenges hindering their practical deployments: Task-Distribution Shift ( TDS) and Task-Distribution Corruption (TDC)
arXiv Detail & Related papers (2023-11-23T15:46:54Z)
SecureCut: Federated Gradient Boosting Decision Trees with Efficient Machine Unlearning [10.011146979811752]
It has become imperative to enable data removal in Vertical Federated Learning (VFL) where multiple parties provide private features for model training. In VFL, data removal, i.e., textitmachine unlearning, often requires removing specific features across all samples under privacy guarentee. We propose methname, a novel Gradient Boosting Decision Tree (GBDT) framework that effectively enables both textitinstance unlearning and textitfeature unlearning without the need for retraining from scratch.
arXiv Detail & Related papers (2023-11-22T05:38:53Z)
Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data. This process might suffer from privacy issues and violations of data protection regulations. We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z)
RelaxLoss: Defending Membership Inference Attacks without Losing Utility [68.48117818874155]
We propose a novel training framework based on a relaxed loss with a more achievable learning target. RelaxLoss is applicable to any classification model with added benefits of easy implementation and negligible overhead. Our approach consistently outperforms state-of-the-art defense mechanisms in terms of resilience against MIAs.
arXiv Detail & Related papers (2022-07-12T19:34:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.