MeGU: Machine-Guided Unlearning with Target Feature Disentanglement
- URL: http://arxiv.org/abs/2602.17088v1
- Date: Thu, 19 Feb 2026 05:20:31 GMT
- Title: MeGU: Machine-Guided Unlearning with Target Feature Disentanglement
- Authors: Haoyu Wang, Zhuo Huang, Xiaolong Wang, Bo Han, Zhiwei Lin, Tongliang Liu,
- Abstract summary: We propose a novel framework that guides unlearning through concept-aware re-alignment.<n>MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
- Score: 73.49657372882082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing concern over training data privacy has elevated the "Right to be Forgotten" into a critical requirement, thereby raising the demand for effective Machine Unlearning. However, existing unlearning approaches commonly suffer from a fundamental trade-off: aggressively erasing the influence of target data often degrades model utility on retained data, while conservative strategies leave residual target information intact. In this work, the intrinsic representation properties learned during model pretraining are analyzed. It is demonstrated that semantic class concepts are entangled at the feature-pattern level, sharing associated features while preserving concept-specific discriminative components. This entanglement fundamentally limits the effectiveness of existing unlearning paradigms. Motivated by this insight, we propose Machine-Guided Unlearning (MeGU), a novel framework that guides unlearning through concept-aware re-alignment. Specifically, Multi-modal Large Language Models (MLLMs) are leveraged to explicitly determine re-alignment directions for target samples by assigning semantically meaningful perturbing labels. To improve efficiency, inter-class conceptual similarities estimated by the MLLM are encoded into a lightweight transition matrix. Furthermore, MeGU introduces a positive-negative feature noise pair to explicitly disentangle target concept influence. During finetuning, the negative noise suppresses target-specific feature patterns, while the positive noise reinforces remaining associated features and aligns them with perturbing concepts. This coordinated design enables selective disruption of target-specific representations while preserving shared semantic structures. As a result, MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
Related papers
- Distribution-Guided and Constrained Quantum Machine Unlearning [5.518378568494161]
Machine unlearning aims to remove the influence of specific training data from a learned model without full retraining.<n>We propose a distribution-guided framework for class-level quantum machine unlearning that treats unlearning as a constrained optimization problem.
arXiv Detail & Related papers (2026-01-07T21:44:20Z) - Beyond Memorization: Gradient Projection Enables Selective Learning in Diffusion Models [3.4064487905075294]
Memorization in large-scale text-to-image diffusion models poses significant security and intellectual property risks.<n>We introduce a Gradient Projection Framework designed to enforce a stringent requirement of concept-level feature exclusion.<n>Our approach establishes a new paradigm for IP-safe and privacy-preserving generative AI.
arXiv Detail & Related papers (2025-12-12T00:50:38Z) - AUVIC: Adversarial Unlearning of Visual Concepts for Multi-modal Large Language Models [63.05306474002547]
Regulatory frameworks mandating the 'right to be forgotten' drive the need for machine unlearning.<n>We introduce AUVIC, a novel visual concept unlearning framework for MLLMs.<n>We show that AUVIC achieves state-of-the-art target forgetting rates while incurs minimal performance degradation on non-target concepts.
arXiv Detail & Related papers (2025-11-14T13:35:32Z) - LLM Unlearning on Noisy Forget Sets: A Study of Incomplete, Rewritten, and Watermarked Data [69.5099112089508]
Large language models (LLMs) exhibit remarkable generative capabilities but raise ethical and security concerns by memorizing sensitive data.<n>This work presents the first study of unlearning under perturbed or low-fidelity forget data, referred to as noisy forget sets.<n>We find that unlearning remains surprisingly robust to perturbations, provided that core semantic signals are preserved.
arXiv Detail & Related papers (2025-10-10T05:10:49Z) - Interpretable Few-Shot Image Classification via Prototypical Concept-Guided Mixture of LoRA Experts [79.18608192761512]
Self-Explainable Models (SEMs) rely on Prototypical Concept Learning (PCL) to enable their visual recognition processes more interpretable.<n>We propose a Few-Shot Prototypical Concept Classification framework that mitigates two key challenges under low-data regimes: parametric imbalance and representation misalignment.<n>Our approach consistently outperforms existing SEMs by a notable margin, with 4.2%-8.7% relative gains in 5-way 5-shot classification.
arXiv Detail & Related papers (2025-06-05T06:39:43Z) - GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection [36.38245533018162]
Large Language Models (LLMs) have demonstrated strong capabilities in memorizing vast amounts of knowledge across diverse domains.<n>Existing unlearning efforts typically fine-tune the model with resources such as forget data, retain data, and a calibration model.<n>We propose Generation-time Unlearning via Adaptive Restriction and Detection (GUARD), a framework that enables dynamic unlearning during LLM generation.
arXiv Detail & Related papers (2025-05-19T16:26:58Z) - Source-Free Domain Adaptive Object Detection with Semantics Compensation [54.00183496587841]
We introduce Weak-to-strong Semantics Compensation (WSCo) for strong data augmentation.<n>WSCo compensates for the class-relevant semantics that may be lost during strong augmentation on the fly.<n>WSCo can be implemented as a generic plug-in, easily integrable with any existing SFOD pipelines.
arXiv Detail & Related papers (2024-10-07T23:32:06Z) - Erasing Conceptual Knowledge from Language Models [24.63143961814566]
We introduce Erasure of Language Memory (ELM), a principled approach to concept-level unlearning.<n>ELM operates by matching distributions defined by the model's own introspective classification capabilities.<n>We demonstrate ELM's efficacy on biosecurity, cybersecurity, and literary domain erasure tasks.
arXiv Detail & Related papers (2024-10-03T17:59:30Z) - Decoupling the Class Label and the Target Concept in Machine Unlearning [81.69857244976123]
Machine unlearning aims to adjust a trained model to approximate a retrained one that excludes a portion of training data.
Previous studies showed that class-wise unlearning is successful in forgetting the knowledge of a target class.
We propose a general framework, namely, TARget-aware Forgetting (TARF)
arXiv Detail & Related papers (2024-06-12T14:53:30Z) - An Information Compensation Framework for Zero-Shot Skeleton-based Action Recognition [49.45660055499103]
Zero-shot human skeleton-based action recognition aims to construct a model that can recognize actions outside the categories seen during training.
Previous research has focused on aligning sequences' visual and semantic spatial distributions.
We introduce a new loss function sampling method to obtain a tight and robust representation.
arXiv Detail & Related papers (2024-06-02T06:53:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.