MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering
- URL: http://arxiv.org/abs/2510.04217v2
- Date: Fri, 10 Oct 2025 11:47:30 GMT
- Title: MLLMEraser: Achieving Test-Time Unlearning in Multimodal Large Language Models through Activation Steering
- Authors: Chenlu Ding, Jiancan Wu, Leheng Sheng, Fan Zhang, Yancheng Yuan, Xiang Wang, Xiangnan He,
- Abstract summary: MLLMEraser is an input-aware, training-free framework for test-time unlearning.<n>We construct a multimodal erasure direction by contrasting adversarially perturbed, knowledge-recall image-text pairs.<n>Experiments on LLaVA-1.5 and Qwen-2.5-VL demonstrate that MLLMEraser consistently outperforms state-of-the-art MLLM unlearning baselines.
- Score: 36.80441487363007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal large language models (MLLMs) have demonstrated remarkable capabilities across vision-language tasks, yet their large-scale deployment raises pressing concerns about memorized private data, outdated knowledge, and harmful content. Existing unlearning approaches for MLLMs typically adapt training-based strategies such as gradient ascent or preference optimization, but these methods are computationally expensive, irreversible, and often distort retained knowledge. In this work, we propose MLLMEraser, an input-aware, training-free framework for test-time unlearning. Our approach leverages activation steering to enable dynamic knowledge erasure without parameter updates. Specifically, we construct a multimodal erasure direction by contrasting adversarially perturbed, knowledge-recall image-text pairs with knowledge-erasure counterparts, capturing both textual and visual discrepancies. To prevent unnecessary interference, we further design an input-aware steering mechanism that adaptively determines when and how the erasure direction should be applied, preserving utility on retained knowledge while enforcing forgetting on designated content. Experiments on LLaVA-1.5 and Qwen-2.5-VL demonstrate that MLLMEraser consistently outperforms state-of-the-art MLLM unlearning baselines, achieving stronger forgetting performance with lower computational cost and minimal utility degradation.
Related papers
- KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models [26.418820118903852]
Large language models (LLMs) acquire a large amount of knowledge through pre-training on vast and diverse corpora.<n>LLMs unlearning is a promising technique to reduce risks associated with sensitive, copyrighted, or harmful content in training data.<n>We propose Knowledge Unlearning by Deviating representAtion (KUDA) to achieve effective unlearning at the knowledge level of LLMs.
arXiv Detail & Related papers (2026-02-22T17:16:49Z) - Unlearning Sensitive Information in Multimodal LLMs: Benchmark and Attack-Defense Evaluation [88.78166077081912]
We introduce a multimodal unlearning benchmark, UnLOK-VQA, and an attack-and-defense framework to evaluate methods for deleting specific multimodal knowledge from MLLMs.<n>Our results show multimodal attacks outperform text- or image-only ones, and that the most effective defense removes answer information from internal model states.
arXiv Detail & Related papers (2025-05-01T01:54:00Z) - CL-MoE: Enhancing Multimodal Large Language Model with Dual Momentum Mixture-of-Experts for Continual Visual Question Answering [27.812611421754482]
We propose an MLLMs-based dual momentum Mixture-of-Experts (CL-MoE) framework for continual visual question answering (VQA)<n>We integrate MLLMs with continual learning to utilize the rich commonsense knowledge in LLMs.<n>Our method achieves state-of-the-art performance on 10 VQA tasks, proving the effectiveness of our approach.
arXiv Detail & Related papers (2025-03-01T09:25:23Z) - Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering [44.008094698200026]
We introduce a novel method to enhance the adaptability of MLLMs by integrating external knowledge sources.<n>Our proposed model, Reflective LLaVA (ReflectiVA), utilizes reflective tokens to dynamically determine the need for external knowledge.<n>This ultimately enables the MLLM to manage external knowledge while preserving fluency and performance on tasks where external knowledge is not needed.
arXiv Detail & Related papers (2024-11-25T19:01:03Z) - When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge? [15.318301783084681]
Large language models (LLMs) can inadvertently learn and retain sensitive information and harmful content during training.
We propose a lightweight unlearning framework based on Retrieval-Augmented Generation (RAG) technology.
We evaluate our framework through extensive experiments on both open-source and closed-source models, including ChatGPT, Gemini, Llama-2-7b-chat-hf, and PaLM 2.
arXiv Detail & Related papers (2024-10-20T03:51:01Z) - RA-BLIP: Multimodal Adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training [55.54020926284334]
Multimodal Large Language Models (MLLMs) have recently received substantial interest, which shows their emerging potential as general-purpose models for various vision-language tasks.
Retrieval augmentation techniques have proven to be effective plugins for both LLMs and MLLMs.
In this study, we propose multimodal adaptive Retrieval-Augmented Bootstrapping Language-Image Pre-training (RA-BLIP), a novel retrieval-augmented framework for various MLLMs.
arXiv Detail & Related papers (2024-10-18T03:45:19Z) - MoExtend: Tuning New Experts for Modality and Task Extension [61.29100693866109]
MoExtend is an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models.
MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models.
arXiv Detail & Related papers (2024-08-07T02:28:37Z) - MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced Reranking and Noise-injected Training [9.023648972811458]
RagVL is a novel framework with knowledge-enhanced reranking and noise-injected training.
We instruction-tune the MLLM with a simple yet effective instruction template to induce its ranking ability.
For generation, we inject visual noise during training at the data and token levels to enhance the generator's robustness.
arXiv Detail & Related papers (2024-07-31T08:43:17Z) - Learn while Unlearn: An Iterative Unlearning Framework for Generative Language Models [52.40798352740857]
We introduce the Iterative Contrastive Unlearning (ICU) framework, which consists of three core components.<n>A Knowledge Unlearning Induction module targets specific knowledge for removal using an unlearning loss.<n>A Contrastive Learning Enhancement module preserves the model's expressive capabilities against the pure unlearning goal.<n>An Iterative Unlearning Refinement module dynamically adjusts the unlearning process through ongoing evaluation and updates.
arXiv Detail & Related papers (2024-07-25T07:09:35Z) - Mind the Interference: Retaining Pre-trained Knowledge in Parameter Efficient Continual Learning of Vision-Language Models [79.28821338925947]
Domain-Class Incremental Learning is a realistic but challenging continual learning scenario.
To handle these diverse tasks, pre-trained Vision-Language Models (VLMs) are introduced for their strong generalizability.
This incurs a new problem: the knowledge encoded in the pre-trained VLMs may be disturbed when adapting to new tasks, compromising their inherent zero-shot ability.
Existing methods tackle it by tuning VLMs with knowledge distillation on extra datasets, which demands heavy overhead.
We propose the Distribution-aware Interference-free Knowledge Integration (DIKI) framework, retaining pre-trained knowledge of
arXiv Detail & Related papers (2024-07-07T12:19:37Z) - Rethinking Machine Unlearning for Large Language Models [85.92660644100582]
We explore machine unlearning in the domain of large language models (LLMs)<n>This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities.
arXiv Detail & Related papers (2024-02-13T20:51:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.