Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models
- URL: http://arxiv.org/abs/2510.03263v1
- Date: Fri, 26 Sep 2025 19:11:01 GMT
- Title: Memory Self-Regeneration: Uncovering Hidden Knowledge in Unlearned Models
- Authors: Agnieszka Polowczyk, Alicja Polowczyk, Joanna Waczyńska, Piotr Borycki, Przemysław Spurek,
- Abstract summary: We present considerations regarding the ability of models to forget and recall knowledge.<n>We present MemoRa strategy, which we consider to be a regenerative approach supporting the effective recovery of previously lost knowledge.
- Score: 1.3654763247057877
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The impressive capability of modern text-to-image models to generate realistic visuals has come with a serious drawback: they can be misused to create harmful, deceptive or unlawful content. This has accelerated the push for machine unlearning. This new field seeks to selectively remove specific knowledge from a model's training data without causing a drop in its overall performance. However, it turns out that actually forgetting a given concept is an extremely difficult task. Models exposed to attacks using adversarial prompts show the ability to generate so-called unlearned concepts, which can be not only harmful but also illegal. In this paper, we present considerations regarding the ability of models to forget and recall knowledge, introducing the Memory Self-Regeneration task. Furthermore, we present MemoRa strategy, which we consider to be a regenerative approach supporting the effective recovery of previously lost knowledge. Moreover, we propose that robustness in knowledge retrieval is a crucial yet underexplored evaluation measure for developing more robust and effective unlearning techniques. Finally, we demonstrate that forgetting occurs in two distinct ways: short-term, where concepts can be quickly recalled, and long-term, where recovery is more challenging.
Related papers
- ROKA: Robust Knowledge Unlearning against Adversaries [0.9236074230806578]
We introduce a new unlearning-induced attack model, namely indirect unlearning attack, which does not require data manipulation but exploits the consequence of knowledge contamination to perturb the model accuracy on security-critical predictions.<n>Our work is the first to provide a theoretical guarantee for knowledge preservation during unlearning. Evaluations on various large models, including vision transformers, multi-modal models, and large language models, show that ROKA effectively unlearns targets while preserving, or even enhancing, the accuracy of retained data.
arXiv Detail & Related papers (2026-02-28T03:30:39Z) - The Unseen Threat: Residual Knowledge in Machine Unlearning under Perturbed Samples [16.030881842099998]
We show that slight perturbeds of forget samples may still be correctly recognized by the unlearned model.<n>We propose a fine-tuning strategy, named RURK, that penalizes the model's ability to re-recognize forget samples.
arXiv Detail & Related papers (2026-01-29T22:10:13Z) - Unconsciously Forget: Mitigating Memorization; Without Knowing What is being Memorized [41.5028352241977]
Memorizing training data can lead to legal challenges, including copyright infringement, violations of portrait rights, and trademark violations.<n>Our work demonstrates that specific parts of the model are responsible for copyrighted content generation.<n>By applying model pruning, we can effectively suppress the probability of generating copyrighted content without targeting specific concepts.
arXiv Detail & Related papers (2025-12-10T14:36:12Z) - Forget to Know, Remember to Use: Context-Aware Unlearning for Large Language Models [17.249936460923045]
Large language models may encode sensitive information or outdated knowledge that needs to be removed.<n>Unlearning is an efficient alternative to full retraining, aiming to remove specific knowledge while preserving overall model utility.
arXiv Detail & Related papers (2025-10-20T15:03:45Z) - Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning [9.512928441517811]
Foundation models have transformed multimedia analysis by enabling robust and transferable representations across diverse modalities and tasks.<n>Traditional unlearning approaches, including retraining, activation editing, or distillation, are often expensive, fragile, and ill-suited for real-time or continuously evolving systems.<n>We introduce a prompt-based learning framework that unifies knowledge acquisition and removal within a single training phase.
arXiv Detail & Related papers (2025-09-05T13:28:04Z) - Mitigating Catastrophic Forgetting and Mode Collapse in Text-to-Image Diffusion via Latent Replay [0.0]
Continual learning is fundamental to natural intelligence.<n>"catastrophic forgetting" occurs when learning new tasks erases previously acquired knowledge.<n>Latent Replay enables efficient continual learning for generative AI models.
arXiv Detail & Related papers (2025-09-04T23:45:22Z) - Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models [9.719371187651591]
Unlearning techniques suppress and leave the knowledge beneath the surface, thus making it retrievable with the right prompts.<n>We introduce a step-by-step reasoning-based black-box attack, Sleek, that systematically exposes unlearning failures.<n>Of the generated adversarial prompts, 62.5% successfully retrieved forgotten Harry Potter facts from WHP-unlearned Llama, while 50% exposed unfair suppression of retained knowledge.
arXiv Detail & Related papers (2025-06-14T04:22:17Z) - Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion [56.35484513848296]
This research introduces continual unlearning', a novel paradigm that enables the targeted removal of multiple specific concepts from foundational generative models.<n>We propose Decremental Unlearning without Generalization Erosion (DUGE) algorithm which selectively unlearns the generation of undesired concepts.
arXiv Detail & Related papers (2025-03-17T23:17:16Z) - RESTOR: Knowledge Recovery in Machine Unlearning [71.75834077528305]
Large language models trained on web-scale corpora can contain private or sensitive information.<n>Several machine unlearning algorithms have been proposed to eliminate the effect of such datapoints.<n>We propose the RESTOR framework for machine unlearning evaluation.
arXiv Detail & Related papers (2024-10-31T20:54:35Z) - UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI [50.61495097098296]
We revisit the paradigm in which unlearning is used for Large Language Models (LLMs)
We introduce a concept of ununlearning, where unlearned knowledge gets reintroduced in-context.
We argue that content filtering for impermissible knowledge will be required and even exact unlearning schemes are not enough for effective content regulation.
arXiv Detail & Related papers (2024-06-27T10:24:35Z) - Hiding and Recovering Knowledge in Text-to-Image Diffusion Models via Learnable Prompts [23.04942433104886]
We introduce a novel concept-hiding approach that makes unwanted concepts inaccessible to public users.<n>Instead of erasing knowledge from the model entirely, we incorporate a learnable prompt into the cross-attention module.<n>This enables flexible access control -- ensuring that undesirable content cannot be easily generated while preserving the option to reinstate it.
arXiv Detail & Related papers (2024-03-18T23:42:04Z) - Anti-Retroactive Interference for Lifelong Learning [65.50683752919089]
We design a paradigm for lifelong learning based on meta-learning and associative mechanism of the brain.
It tackles the problem from two aspects: extracting knowledge and memorizing knowledge.
It is theoretically analyzed that the proposed learning paradigm can make the models of different tasks converge to the same optimum.
arXiv Detail & Related papers (2022-08-27T09:27:36Z) - Learning with Recoverable Forgetting [77.56338597012927]
Learning wIth Recoverable Forgetting explicitly handles the task- or sample-specific knowledge removal and recovery.
Specifically, LIRF brings in two innovative schemes, namely knowledge deposit and withdrawal.
We conduct experiments on several datasets, and demonstrate that the proposed LIRF strategy yields encouraging results with gratifying generalization capability.
arXiv Detail & Related papers (2022-07-17T16:42:31Z) - False Memory Formation in Continual Learners Through Imperceptible
Backdoor Trigger [3.3439097577935213]
sequentially learning new information presented to a continual (incremental) learning model.
We show that an intelligent adversary can introduce small amount of misinformation to the model during training to cause deliberate forgetting of a specific task or class at test time.
We demonstrate such an adversary's ability to assume control of the model by injecting "backdoor" attack samples to commonly used generative replay and regularization based continual learning approaches.
arXiv Detail & Related papers (2022-02-09T14:21:13Z) - Preserving Earlier Knowledge in Continual Learning with the Help of All
Previous Feature Extractors [63.21036904487014]
Continual learning of new knowledge over time is one desirable capability for intelligent systems to recognize more and more classes of objects.
We propose a simple yet effective fusion mechanism by including all the previously learned feature extractors into the intelligent model.
Experiments on multiple classification tasks show that the proposed approach can effectively reduce the forgetting of old knowledge, achieving state-of-the-art continual learning performance.
arXiv Detail & Related papers (2021-04-28T07:49:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.