Related papers: ReLearn: Unlearning via Learning for Large Language Models

ReLearn: Unlearning via Learning for Large Language Models

URL: http://arxiv.org/abs/2502.11190v1
Date: Sun, 16 Feb 2025 16:31:00 GMT
Title: ReLearn: Unlearning via Learning for Large Language Models
Authors: Haoming Xu, Ningyuan Zhao, Liming Yang, Sendong Zhao, Shumin Deng, Mengru Wang, Bryan Hooi, Nay Oo, Huajun Chen, Ningyu Zhang,
Abstract summary: We propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning.<n>This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation.<n>Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output.
Score: 64.2802606302194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Current unlearning methods for large language models usually rely on reverse optimization to reduce target token probabilities. However, this paradigm disrupts the subsequent tokens prediction, degrading model performance and linguistic coherence. Moreover, existing evaluation metrics overemphasize contextual forgetting while inadequately assessing response fluency and relevance. To address these challenges, we propose ReLearn, a data augmentation and fine-tuning pipeline for effective unlearning, along with a comprehensive evaluation framework. This framework introduces Knowledge Forgetting Rate (KFR) and Knowledge Retention Rate (KRR) to measure knowledge-level preservation, and Linguistic Score (LS) to evaluate generation quality. Our experiments show that ReLearn successfully achieves targeted forgetting while preserving high-quality output. Through mechanistic analysis, we further demonstrate how reverse optimization disrupts coherent text generation, while ReLearn preserves this essential capability. Code is available at https://github.com/zjunlp/unlearn.

Related papers

Catastrophic Failure of LLM Unlearning via Quantization [36.524827594501495]
We show that applying quantization to models that have undergone unlearning can restore the "forgotten" information.<n>We find that for unlearning methods with utility constraints, the unlearned model retains an average of 21% of the intended forgotten knowledge in full precision.
arXiv Detail & Related papers (2024-10-21T19:28:37Z)
Erasing Conceptual Knowledge from Language Models [24.63143961814566]
Erasure of Language Memory (ELM) is an approach for concept-level unlearning built on the principle of matching the distribution defined by an introspective classifier. ELM applies this framework to create targeted low-rank updates that reduce generation probabilities for concept-specific content. We demonstrate ELM's efficacy on biosecurity, cybersecurity, and literary domain erasure tasks.
arXiv Detail & Related papers (2024-10-03T17:59:30Z)
Towards Effective Evaluations and Comparisons for LLM Unlearning Methods [97.2995389188179]
This paper seeks to refine the evaluation of machine unlearning for large language models. It addresses two key challenges -- the robustness of evaluation metrics and the trade-offs between competing goals.
arXiv Detail & Related papers (2024-06-13T14:41:00Z)
Robustness-preserving Lifelong Learning via Dataset Condensation [11.83450966328136]
'catastrophic forgetting' refers to a notorious dilemma between improving model accuracy over new data and retaining accuracy over previous data. We propose a new memory-replay LL strategy that leverages modern bi-level optimization techniques to determine the 'coreset' of the current data. We term the resulting LL framework 'Data-Efficient Robustness-Preserving LL' (DERPLL) Experimental results show that DERPLL outperforms the conventional coreset-guided LL baseline.
arXiv Detail & Related papers (2023-03-07T19:09:03Z)
Learning Large-scale Neural Fields via Context Pruned Meta-Learning [60.93679437452872]
We introduce an efficient optimization-based meta-learning technique for large-scale neural field training. We show how gradient re-scaling at meta-test time allows the learning of extremely high-quality neural fields. Our framework is model-agnostic, intuitive, straightforward to implement, and shows significant reconstruction improvements for a wide range of signals.
arXiv Detail & Related papers (2023-02-01T17:32:16Z)
Offline RL for Natural Language Generation with Implicit Language Q Learning [87.76695816348027]
Large language models can be inconsistent when it comes to completing user specified tasks. We propose a novel RL method, that combines both the flexible utility framework of RL with the ability of supervised learning. In addition to empirically validating ILQL, we present a detailed empirical analysis situations where offline RL can be useful in natural language generation settings.
arXiv Detail & Related papers (2022-06-05T18:38:42Z)
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies. We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z)
Enhancing Dialogue Generation via Multi-Level Contrastive Learning [57.005432249952406]
We propose a multi-level contrastive learning paradigm to model the fine-grained quality of the responses with respect to the query. A Rank-aware (RC) network is designed to construct the multi-level contrastive optimization objectives. We build a Knowledge Inference (KI) component to capture the keyword knowledge from the reference during training and exploit such information to encourage the generation of informative words.
arXiv Detail & Related papers (2020-09-19T02:41:04Z)
Text Generation by Learning from Demonstrations [17.549815256968877]
Current approaches to text generation largely rely on autoregressive models and maximum likelihood estimation. We propose GOLD: an easy-to-optimize algorithm that learns from expert demonstrations by importance weighting. According to both automatic and human evaluation, models trained by GOLD outperform those trained by MLE and policy gradient.
arXiv Detail & Related papers (2020-09-16T17:58:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.