Unmemorization in Large Language Models via Self-Distillation and
Deliberate Imagination
- URL: http://arxiv.org/abs/2402.10052v1
- Date: Thu, 15 Feb 2024 16:21:14 GMT
- Title: Unmemorization in Large Language Models via Self-Distillation and
Deliberate Imagination
- Authors: Yijiang River Dong, Hongzhou Lin, Mikhail Belkin, Ramon Huerta, Ivan
Vuli\'c
- Abstract summary: Large Language Models (LLMs) struggle with crucial issues of privacy violation and unwanted exposure of sensitive data.
We introduce a novel approach termed deliberate imagination in the context of LLM unlearning.
Our results demonstrate the usefulness of this approach across different models and sizes, and also with parameter-efficient fine-tuning.
- Score: 58.36408867180233
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: While displaying impressive generation capabilities across many tasks, Large
Language Models (LLMs) still struggle with crucial issues of privacy violation
and unwanted exposure of sensitive data. This raises an essential question: how
should we prevent such undesired behavior of LLMs while maintaining their
strong generation and natural language understanding (NLU) capabilities? In
this work, we introduce a novel approach termed deliberate imagination in the
context of LLM unlearning. Instead of trying to forget memorized data, we
employ a self-distillation framework, guiding LLMs to deliberately imagine
alternative scenarios. As demonstrated in a wide range of experiments, the
proposed method not only effectively unlearns targeted text but also preserves
the LLMs' capabilities in open-ended generation tasks as well as in NLU tasks.
Our results demonstrate the usefulness of this approach across different models
and sizes, and also with parameter-efficient fine-tuning, offering a novel
pathway to addressing the challenges with private and sensitive data in LLM
applications.
Related papers
- Distillation Robustifies Unlearning [36.888726242192504]
We propose a scalable method that distills an unlearned model into a partially noised copy of itself.<n>At its strongest setting, UNDO matches the robustness of a model retrained from scratch with perfect data filtering.<n>We also show that UNDO robustifies unlearning on the more realistic Weapons of Mass Destruction Proxy benchmark.
arXiv Detail & Related papers (2025-06-06T17:58:54Z) - UniErase: Unlearning Token as a Universal Erasure Primitive for Language Models [54.75551043657238]
We introduce UniErase, a novel unlearning paradigm that employs learnable parametric suffix (unlearning token) to steer language models toward targeted forgetting behaviors.<n>UniErase achieves state-of-the-art (SOTA) performance across batch, sequential, and precise unlearning under fictitious and real-world knowledge settings.
arXiv Detail & Related papers (2025-05-21T15:53:28Z) - Machine Unlearning on Pre-trained Models by Residual Feature Alignment Using LoRA [15.542668474378633]
We propose a novel and efficient machine unlearning method on pre-trained models.
We leverage LoRA to decompose the model's intermediate features into pre-trained features and residual features.
The method aims to learn the zero residuals on the retained set and shifted residuals on the unlearning set.
arXiv Detail & Related papers (2024-11-13T08:56:35Z) - Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs [25.91643745340183]
Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora.
This poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods.
We propose Low-rank Knowledge Unlearning (LoKU), a novel framework that enables robust and efficient unlearning for LLMs.
arXiv Detail & Related papers (2024-08-13T04:18:32Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [49.043599241803825]
Iterative Contrastive Unlearning (ICU) framework consists of three core components.
A Knowledge Unlearning Induction module removes specific knowledge through an unlearning loss.
A Contrastive Learning Enhancement module to preserve the model's expressive capabilities against the pure unlearning goal.
And an Iterative Unlearning Refinement module that dynamically assess the unlearning extent on specific data pieces and make iterative update.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Machine Unlearning with Minimal Gradient Dependence for High Unlearning Ratios [18.73206066109299]
Mini-Unlearning is a novel approach that capitalizes on a critical observation: unlearned parameters correlate with retrained parameters through contraction mapping.
This lightweight, scalable method significantly enhances model accuracy and strengthens resistance to membership inference attacks.
Our experiments demonstrate that Mini-Unlearning not only works under higher unlearning ratios but also outperforms existing techniques in both accuracy and security.
arXiv Detail & Related papers (2024-06-24T01:43:30Z) - Learn to Unlearn for Deep Neural Networks: Minimizing Unlearning
Interference with Gradient Projection [56.292071534857946]
Recent data-privacy laws have sparked interest in machine unlearning.
Challenge is to discard information about the forget'' data without altering knowledge about remaining dataset.
We adopt a projected-gradient based learning method, named as Projected-Gradient Unlearning (PGU)
We provide empirically evidence to demonstrate that our unlearning method can produce models that behave similar to models retrained from scratch across various metrics even when the training dataset is no longer accessible.
arXiv Detail & Related papers (2023-12-07T07:17:24Z) - Unlearn What You Want to Forget: Efficient Unlearning for LLMs [92.51670143929056]
Large language models (LLMs) have achieved significant progress from pre-training on and memorizing a wide range of textual data.
This process might suffer from privacy issues and violations of data protection regulations.
We propose an efficient unlearning framework that could efficiently update LLMs without having to retrain the whole model after data removals.
arXiv Detail & Related papers (2023-10-31T03:35:59Z) - Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization [51.34904967046097]
Continual learning seeks to overcome the challenge of catastrophic forgetting, where a model forgets previously learnt information.
We introduce a novel prior-based method that better constrains parameter growth, reducing catastrophic forgetting.
Results show that BAdam achieves state-of-the-art performance for prior-based methods on challenging single-headed class-incremental experiments.
arXiv Detail & Related papers (2023-09-15T17:10:51Z) - Meta-Learning Online Adaptation of Language Models [88.8947656843812]
Large language models encode impressively broad world knowledge in their parameters.
However, the knowledge in static language models falls out of date, limiting the model's effective "shelf life"
arXiv Detail & Related papers (2023-05-24T11:56:20Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.