Related papers: Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning

Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning

URL: http://arxiv.org/abs/2511.20196v1
Date: Tue, 25 Nov 2025 11:22:45 GMT
Title: Towards Benign Memory Forgetting for Selective Multimodal Large Language Model Unlearning
Authors: Zhen Zeng, Leijiang Gu, Zhangling Duan, Feng Li, Zenglin Shi, Cees G. M. Snoek, Meng Wang,
Abstract summary: Multimodal Large Language Models (MLLMs) achieve remarkable capabilities but can inadvertently memorize privacy-sensitive information.<n>Existing unlearning methods fail to achieve benign forgetting because they often degrade the model's general image understanding performance.<n>We propose the Sculpted Memory Forgetting Adapter (SMFA), which confines forgetting to targeted memory regions while preserving overall capabilities.
Score: 49.274436951541425
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal Large Language Models (MLLMs) achieve remarkable capabilities but can inadvertently memorize privacy-sensitive information. Although existing unlearning methods can remove such knowledge, they fail to achieve benign forgetting because they often degrade the model's general image understanding performance. To address this, we propose the Sculpted Memory Forgetting Adapter (SMFA), which confines forgetting to targeted memory regions while preserving overall capabilities. SMFA first fine-tunes the model to replace sensitive responses with refusals, yielding a memory forgetting adapter, and then applies a retaining anchor-guided masking mechanism to prevent interference with unrelated knowledge and understanding ability. To systematically evaluate selective MLLM unlearning, we introduce S-MLLMUn Bench, the first benchmark designed to jointly assess the removal of sensitive knowledge and retention of general visual understanding. Extensive experiments show that, unlike prior methods, SMFA achieves precise and controllable unlearning while maintaining the model's foundational image understanding.

Related papers

KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models [26.418820118903852]
Large language models (LLMs) acquire a large amount of knowledge through pre-training on vast and diverse corpora.<n>LLMs unlearning is a promising technique to reduce risks associated with sensitive, copyrighted, or harmful content in training data.<n>We propose Knowledge Unlearning by Deviating representAtion (KUDA) to achieve effective unlearning at the knowledge level of LLMs.
arXiv Detail & Related papers (2026-02-22T17:16:49Z)
MeGU: Machine-Guided Unlearning with Target Feature Disentanglement [73.49657372882082]
We propose a novel framework that guides unlearning through concept-aware re-alignment.<n>MeGU enables controlled and selective forgetting, effectively mitigating both under-unlearning and over-unlearning.
arXiv Detail & Related papers (2026-02-19T05:20:31Z)
Nested Learning: The Illusion of Deep Learning Architectures [57.41377373511876]
We present a new learning paradigm, called Nested Learning (NL), that coherently represents a machine learning model with a set of nested, multi-level, and/or parallel problems.<n>We show three core contributions: Expressive generalizations are in fact as generalizations with deep memory and/or more powerful learning rules.<n>We present a new continuum for memory system that generalizes the traditional viewpoint of long/short-term memory.
arXiv Detail & Related papers (2025-12-31T07:59:43Z)
MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering [36.80441487363007]
MLLMEraser is an input-aware, training-free framework for test-time unlearning.<n>We construct a multimodal erasure direction by contrasting adversarially perturbed, knowledge-recall image-text pairs.<n>Experiments on LLaVA-1.5 and Qwen-2.5-VL demonstrate that MLLMEraser consistently outperforms state-of-the-art MLLM unlearning baselines.
arXiv Detail & Related papers (2025-10-05T14:20:17Z)
MemOS: A Memory OS for AI System [116.87568350346537]
Large Language Models (LLMs) have become an essential infrastructure for Artificial General Intelligence (AGI)<n>Existing models mainly rely on static parameters and short-lived contextual states, limiting their ability to track user preferences or update knowledge over extended periods.<n>MemOS is a memory operating system that treats memory as a manageable system resource.
arXiv Detail & Related papers (2025-07-04T17:21:46Z)
SAUCE: Selective Concept Unlearning in Vision-Language Models with Sparse Autoencoders [16.551943721248108]
We introduce SAUCE, a novel method for fine-grained and selective concept unlearning in vision-language models.<n>It first trains SAEs to capture high-dimensional, semantically rich sparse features.<n>It then identifies the features most relevant to the target concept for unlearning.<n>During inference, it selectively modifies these features to suppress specific concepts while preserving unrelated information.
arXiv Detail & Related papers (2025-03-16T17:32:23Z)
Erasing Without Remembering: Implicit Knowledge Forgetting in Large Language Models [81.62767292169225]
We investigate knowledge forgetting in large language models with a focus on its generalisation.<n>We propose PerMU, a novel probability perturbation-based unlearning paradigm.<n>Experiments are conducted on a diverse range of datasets, including TOFU, Harry Potter, ZsRE, WMDP, and MUSE.
arXiv Detail & Related papers (2025-02-27T11:03:33Z)
Disentangling Memory and Reasoning Ability in Large Language Models [97.26827060106581]
We propose a new inference paradigm that decomposes the complex inference process into two distinct and clear actions.<n>Our experiment results show that this decomposition improves model performance and enhances the interpretability of the inference process.
arXiv Detail & Related papers (2024-11-20T17:55:38Z)
Large Language Model Unlearning via Embedding-Corrupted Prompts [10.889859281637406]
We present textbfEmbedding-COrrupted (ECO) Prompts, a lightweight unlearning framework for large language models. We enforce an unlearned state during inference by employing a prompt classifier to identify and safeguard prompts to forget. We find that these embedding-corrupted prompts not only lead to desirable outputs that satisfy the unlearning objective but also closely approximate the output from a model that has never been trained on the data intended for forgetting.
arXiv Detail & Related papers (2024-06-12T06:56:20Z)
Tuning-Free Accountable Intervention for LLM Deployment -- A Metacognitive Approach [55.613461060997004]
Large Language Models (LLMs) have catalyzed transformative advances across a spectrum of natural language processing tasks. We propose an innovative textitmetacognitive approach, dubbed textbfCLEAR, to equip LLMs with capabilities for self-aware error identification and correction.
arXiv Detail & Related papers (2024-03-08T19:18:53Z)
Towards Safer Large Language Models through Machine Unlearning [19.698620794387338]
Selective Knowledge Unlearning ( SKU) is designed to eliminate harmful knowledge while preserving utility on normal prompts. First stage aims to identify and acquire harmful knowledge within the model, whereas the second is dedicated to remove this knowledge. Our experiments demonstrate that SKU identifies a good balance point between removing harmful information and preserving utility.
arXiv Detail & Related papers (2024-02-15T16:28:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.