Related papers: Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models

Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models

URL: http://arxiv.org/abs/2510.03302v1
Date: Tue, 30 Sep 2025 07:46:19 GMT
Title: Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models
Authors: Daiheng Gao, Nanxiang Jiang, Andi Zhang, Shilin Lu, Yufei Tang, Wenbo Zhou, Weiming Zhang, Zhaoxin Fan,
Abstract summary: Concept erasure techniques have been widely deployed in T2I diffusion models to prevent inappropriate content generation for safety and copyright considerations.<n> established erasure methods exhibit degraded effectiveness, raising questions about their true mechanisms.<n>We propose textbfRevAm, a trajectory optimization framework that resurrects erased concepts by dynamically steering the denoising process.
Score: 38.38751366738881
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Concept erasure techniques have been widely deployed in T2I diffusion models to prevent inappropriate content generation for safety and copyright considerations. However, as models evolve to next-generation architectures like Flux, established erasure methods (\textit{e.g.}, ESD, UCE, AC) exhibit degraded effectiveness, raising questions about their true mechanisms. Through systematic analysis, we reveal that concept erasure creates only an illusion of ``amnesia": rather than genuine forgetting, these methods bias sampling trajectories away from target concepts, making the erasure fundamentally reversible. This insight motivates the need to distinguish superficial safety from genuine concept removal. In this work, we propose \textbf{RevAm} (\underline{Rev}oking \underline{Am}nesia), an RL-based trajectory optimization framework that resurrects erased concepts by dynamically steering the denoising process without modifying model weights. By adapting Group Relative Policy Optimization (GRPO) to diffusion models, RevAm explores diverse recovery trajectories through trajectory-level rewards, overcoming local optima that limit existing methods. Extensive experiments demonstrate that RevAm achieves superior concept resurrection fidelity while reducing computational time by 10$\times$, exposing critical vulnerabilities in current safety mechanisms and underscoring the need for more robust erasure techniques beyond trajectory manipulation.

Related papers

Differential Vector Erasure: Unified Training-Free Concept Erasure for Flow Matching Models [49.10620605347065]
We propose Differential Vector Erasure (DVE), a training-free concept erasure method specifically designed for flow matching models.<n>Our key insight is that semantic concepts are implicitly encoded in the directional structure of the velocity field governing the generative flow.<n>During inference, DVE selectively removes concept-specific components by projecting the velocity field onto the differential direction, enabling precise concept suppression without affecting irrelevant semantics.
arXiv Detail & Related papers (2026-02-01T08:05:45Z)
ReLAPSe: Reinforcement-Learning-trained Adversarial Prompt Search for Erased concepts in unlearned diffusion models [12.021923446217722]
Machine unlearning is a key defense mechanism for removing unauthorized concepts from text-to-image diffusion models.<n>Existing adversarial approaches for exploiting this leakage are constrained by fundamental limitations.<n>We introduce ReLAPSe, a policy-based adversarial framework that reformulates concept restoration as a reinforcement learning problem.
arXiv Detail & Related papers (2026-01-30T21:56:50Z)
LURE: Latent Space Unblocking for Multi-Concept Reawakening in Diffusion Models [24.332916173317113]
Concept erasure aims to suppress sensitive content in diffusion models.<n>Recent studies show that erased concepts can still be reawakened, revealing vulnerabilities in erasure methods.<n>We model the generation process as an implicit function to enable a comprehensive theoretical analysis of multiple factors.
arXiv Detail & Related papers (2026-01-20T10:39:11Z)
Boosting Fidelity for Pre-Trained-Diffusion-Based Low-Light Image Enhancement via Condition Refinement [63.54516423266521]
Pre-Trained Diffusion-Based (PTDB) methods often sacrifice content fidelity to attain higher perceptual realism.<n>We propose a novel optimization strategy for conditioning in pre-trained diffusion models, enhancing fidelity while preserving realism and aesthetics.<n>Our approach is plug-and-play, seamlessly integrating into existing diffusion networks to provide more effective control.
arXiv Detail & Related papers (2025-10-20T02:40:06Z)
Erased, But Not Forgotten: Erased Rectified Flow Transformers Still Remain Unsafe Under Concept Attack [37.88516477305766]
We present ReFlux, the first concept attack method specifically designed to assess the robustness of concept erasure in the latest rectified flow-based T2I framework.<n>Our approach is motivated by the observation that existing concept erasure techniques, when applied to Flux, fundamentally rely on a phenomenon known as attention localization.
arXiv Detail & Related papers (2025-10-01T08:12:07Z)
Discrete Diffusion for Reflective Vision-Language-Action Models in Autonomous Driving [55.13109926181247]
We introduce ReflectDrive, a learning-based framework that integrates a reflection mechanism for safe trajectory generation via discrete diffusion.<n>Central to our approach is a safety-aware reflection mechanism that performs iterative self-correction without gradient.<n>Our method begins with goal-conditioned trajectory generation to model multi-modal driving behaviors.
arXiv Detail & Related papers (2025-09-24T13:35:15Z)
TRACE: Trajectory-Constrained Concept Erasure in Diffusion Models [0.0]
Concept erasure aims to remove or suppress specific concept information in a generative model.<n>Trajectory-Constrained Attentional Concept Erasure (TRACE) is a novel method to erase targeted concepts from diffusion models.<n>TRACE achieves state-of-the-art performance, outperforming recent methods such as ANT, EraseAnything, and MACE in terms of removal efficacy and output quality.
arXiv Detail & Related papers (2025-05-29T10:15:22Z)
Continual Unlearning for Foundational Text-to-Image Models without Generalization Erosion [56.35484513848296]
This research introduces continual unlearning', a novel paradigm that enables the targeted removal of multiple specific concepts from foundational generative models.<n>We propose Decremental Unlearning without Generalization Erosion (DUGE) algorithm which selectively unlearns the generation of undesired concepts.
arXiv Detail & Related papers (2025-03-17T23:17:16Z)
Rethinking the Vulnerability of Concept Erasure and a New Method [9.044763606650646]
Concept erasure (defense) methods have been developed to "unlearn" specific concepts through post-hoc finetuning.<n>Recent concept restoration (attack) methods have demonstrated that these supposedly erased concepts can be recovered using adversarially crafted prompts.<n>We introduce **RECORD**, a novel coordinate-descent-based restoration algorithm that consistently outperforms existing restoration methods by up to 17.8 times.
arXiv Detail & Related papers (2025-02-24T17:26:01Z)
Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning. To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers. The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.