Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
- URL: http://arxiv.org/abs/2602.03379v1
- Date: Tue, 03 Feb 2026 10:57:19 GMT
- Title: Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
- Authors: Sangyeon Yoon, Hyesoo Hong, Wonje Jeung, Albert No,
- Abstract summary: We study the phenomenon of benign relearning, in which forgotten information reemerges even from benign fine-tuning data.<n>A common explanation attributes this effect to topical relevance, but we find this account insufficient.<n>We introduce syntactic diversification, which paraphrases the original forget queries into heterogeneous structures prior to unlearning.<n>This approach effectively suppresses benign relearning, accelerates forgetting, and substantially alleviates the trade-off between unlearning efficacy and model utility.
- Score: 6.583686018711596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine unlearning aims to remove specific content from trained models while preserving overall performance. However, the phenomenon of benign relearning, in which forgotten information reemerges even from benign fine-tuning data, reveals that existing unlearning methods remain fundamentally fragile. A common explanation attributes this effect to topical relevance, but we find this account insufficient. Through systematic analysis, we demonstrate that syntactic similarity, rather than topicality, is the primary driver: across benchmarks, syntactically similar data consistently trigger recovery even without topical overlap, due to their alignment in representations and gradients with the forgotten content. Motivated by this insight, we introduce syntactic diversification, which paraphrases the original forget queries into heterogeneous structures prior to unlearning. This approach effectively suppresses benign relearning, accelerates forgetting, and substantially alleviates the trade-off between unlearning efficacy and model utility.
Related papers
- MeGU: Machine-Guided Unlearning with Target Feature Disentanglement [73.49657372882082]
We propose a novel framework that guides unlearning through concept-aware re-alignment.<n>MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
arXiv Detail & Related papers (2026-02-19T05:20:31Z) - Auditing Language Model Unlearning via Information Decomposition [68.48660428111593]
We introduce an interpretable, information-theoretic framework for auditing unlearning using Partial Information Decomposition (PID)<n>By comparing model representations before and after unlearning, we decompose the mutual information with the forgotten data into distinct components, formalizing the notions of unlearned and residual knowledge.<n>Our work introduces a principled, representation-level audit for unlearning, offering theoretical insight and actionable tools for safer deployment of language models.
arXiv Detail & Related papers (2026-01-21T15:51:19Z) - LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data [69.5099112089508]
Large language models (LLMs) exhibit remarkable generative capabilities but raise ethical and security concerns by memorizing sensitive data.<n>This work presents the first study of unlearning under perturbed or low-fidelity forget data, referred to as noisy forget sets.<n>We find that unlearning remains surprisingly robust to perturbations, provided that core semantic signals are preserved.
arXiv Detail & Related papers (2025-10-10T05:10:49Z) - Understanding the Dilemma of Unlearning for Large Language Models [50.54260066313032]
Unlearning seeks to remove specific knowledge from large language models (LLMs)<n>We propose unPact, an interpretable framework for unlearning via prompt attribution and contribution tracking.
arXiv Detail & Related papers (2025-09-29T12:15:19Z) - Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z) - LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning [33.62466543549043]
Loss-based Reweighting Unlearning (LoReUn) is a plug-and-play strategy that dynamically reweights data during the unlearning process with minimal additional computational overhead.<n>Our approach significantly reduces the gap between existing MU methods and exact unlearning in both image classification and generation tasks.
arXiv Detail & Related papers (2025-07-30T09:12:25Z) - OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting [2.6815971241599126]
Machine unlearning seeks to remove the influence of particular data or class from trained models to meet privacy, legal, or ethical requirements.<n>Existing unlearning methods tend to forget shallowly: phenomenon of an unlearned model pretend to forget by adjusting only the model response.<n>We propose a novel general-purpose unlearning algorithm: One-Point-Contraction (OPC)
arXiv Detail & Related papers (2025-07-10T13:34:02Z) - Adversarial Mixup Unlearning [16.89710766008491]
We introduce a novel approach that regularizes the unlearning process by utilizing synthesized mixup samples.<n>At the core of our approach is a generator-unlearner framework, MixUnlearn.<n>We show that our method significantly outperforms state-of-the-art approaches.
arXiv Detail & Related papers (2025-02-14T16:50:33Z) - Repetition In Repetition Out: Towards Understanding Neural Text
Degeneration from the Data Perspective [91.14291142262262]
This work presents a straightforward and fundamental explanation from the data perspective.
Our preliminary investigation reveals a strong correlation between the degeneration issue and the presence of repetitions in training data.
Our experiments reveal that penalizing the repetitions in training data remains critical even when considering larger model sizes and instruction tuning.
arXiv Detail & Related papers (2023-10-16T09:35:42Z) - Machine Unlearning of Features and Labels [72.81914952849334]
We propose first scenarios for unlearning and labels in machine learning models.
Our approach builds on the concept of influence functions and realizes unlearning through closed-form updates of model parameters.
arXiv Detail & Related papers (2021-08-26T04:42:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.