Erased, But Not Forgotten: Erased Rectified Flow Transformers Still Remain Unsafe Under Concept Attack
- URL: http://arxiv.org/abs/2510.00635v2
- Date: Sat, 04 Oct 2025 10:33:43 GMT
- Title: Erased, But Not Forgotten: Erased Rectified Flow Transformers Still Remain Unsafe Under Concept Attack
- Authors: Nanxiang Jiang, Zhaoxin Fan, Enhan Kang, Daiheng Gao, Yun Zhou, Yanxia Chang, Zheng Zhu, Yeying Jin, Wenjun Wu,
- Abstract summary: We present ReFlux, the first concept attack method specifically designed to assess the robustness of concept erasure in the latest rectified flow-based T2I framework.<n>Our approach is motivated by the observation that existing concept erasure techniques, when applied to Flux, fundamentally rely on a phenomenon known as attention localization.
- Score: 37.88516477305766
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in text-to-image (T2I) diffusion models have enabled impressive generative capabilities, but they also raise significant safety concerns due to the potential to produce harmful or undesirable content. While concept erasure has been explored as a mitigation strategy, most existing approaches and corresponding attack evaluations are tailored to Stable Diffusion (SD) and exhibit limited effectiveness when transferred to next-generation rectified flow transformers such as Flux. In this work, we present ReFlux, the first concept attack method specifically designed to assess the robustness of concept erasure in the latest rectified flow-based T2I framework. Our approach is motivated by the observation that existing concept erasure techniques, when applied to Flux, fundamentally rely on a phenomenon known as attention localization. Building on this insight, we propose a simple yet effective attack strategy that specifically targets this property. At its core, a reverse-attention optimization strategy is introduced to effectively reactivate suppressed signals while stabilizing attention. This is further reinforced by a velocity-guided dynamic that enhances the robustness of concept reactivation by steering the flow matching process, and a consistency-preserving objective that maintains the global layout and preserves unrelated content. Extensive experiments consistently demonstrate the effectiveness and efficiency of the proposed attack method, establishing a reliable benchmark for evaluating the robustness of concept erasure strategies in rectified flow transformers.
Related papers
- EraseAnything++: Enabling Concept Erasure in Rectified Flow Transformers Leveraging Multi-Object Optimization [18.80236205171204]
EraseAnything++ is a unified framework for concept erasure in both image and video diffusion models.<n>Our method anchors erasure on key visual representations and propagates it consistently across spatial and temporal dimensions.<n>In the video setting, we further enhance consistency through an anchor-and-propagate mechanism that initializes erasure on reference frames and enforces it throughout subsequent transformer layers.
arXiv Detail & Related papers (2026-03-01T08:13:05Z) - The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization [51.835894707552946]
Unlearning-based defenses claim to purge Not-Safe-For-Work concepts from diffusion models (DMs)<n>We show that unlearning partially disrupts the mapping between linguistic symbols and the underlying knowledge, which remains intact as dormant memories.<n>We propose IVO, a concise and powerful attack framework that reactivates these dormant memories by reconstructing the broken mappings.
arXiv Detail & Related papers (2026-01-30T02:39:51Z) - ActErase: A Training-Free Paradigm for Precise Concept Erasure via Activation Patching [16.08258534688825]
We propose a novel training-free method (ActErase) for efficient concept erasure.<n>Our method achieves state-of-the-art erasure performance, while effectively preserving the model's overall generative capability.
arXiv Detail & Related papers (2026-01-01T09:11:09Z) - Revoking Amnesia: RL-based Trajectory Optimization to Resurrect Erased Concepts in Diffusion Models [38.38751366738881]
Concept erasure techniques have been widely deployed in T2I diffusion models to prevent inappropriate content generation for safety and copyright considerations.<n> established erasure methods exhibit degraded effectiveness, raising questions about their true mechanisms.<n>We propose textbfRevAm, a trajectory optimization framework that resurrects erased concepts by dynamically steering the denoising process.
arXiv Detail & Related papers (2025-09-30T07:46:19Z) - Stealth by Conformity: Evading Robust Aggregation through Adaptive Poisoning [5.205955684180866]
Federated Learning (FL) is a distributed learning paradigm designed to address privacy concerns.<n>We propose Chameleon Poisoning (CHAMP), an adaptive and evasive poisoning strategy.<n>CHAMP enables more effective and evasive poisoning, highlighting a fundamental limitation of existing robust aggregation defenses.
arXiv Detail & Related papers (2025-09-03T13:40:54Z) - Improving Black-Box Generative Attacks via Generator Semantic Consistency [51.470649503929344]
generative attacks produce adversarial examples in a single forward pass at test time.<n>We enforce semantic consistency by aligning the early generator's intermediate features to an EMA teacher.<n>Our approach can be seamlessly integrated into existing generative attacks with consistent improvements in black-box transfer.
arXiv Detail & Related papers (2025-06-23T02:35:09Z) - SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models [56.83154571623655]
We introduce SPEED, an efficient concept erasure approach that directly edits model parameters.<n>Speedy searches for a null space, a model editing space where parameter updates do not affect non-target concepts.<n>We successfully erase 100 concepts within only 5 seconds.
arXiv Detail & Related papers (2025-03-10T14:40:01Z) - Rethinking the Vulnerability of Concept Erasure and a New Method [9.044763606650646]
Concept erasure (defense) methods have been developed to "unlearn" specific concepts through post-hoc finetuning.<n>Recent concept restoration (attack) methods have demonstrated that these supposedly erased concepts can be recovered using adversarially crafted prompts.<n>We introduce **RECORD**, a novel coordinate-descent-based restoration algorithm that consistently outperforms existing restoration methods by up to 17.8 times.
arXiv Detail & Related papers (2025-02-24T17:26:01Z) - EraseAnything: Enabling Concept Erasure in Rectified Flow Transformers [33.195628798316754]
EraseAnything is the first method specifically developed to address concept erasure within the latest flow-based T2I framework.<n>We formulate concept erasure as a bi-level optimization problem, employing LoRA-based parameter tuning and an attention map regularizer.<n>We propose a self-contrastive learning strategy to ensure that removing unwanted concepts does not inadvertently harm performance on unrelated ones.
arXiv Detail & Related papers (2024-12-29T09:42:53Z) - Precise, Fast, and Low-cost Concept Erasure in Value Space: Orthogonal Complement Matters [38.355389084255386]
We propose a precise, fast, and low-cost concept erasure method called Adaptive Value Decomposer (AdaVD)<n>AdaVD excels in both single and multiple concept erasure, showing 2 to 10 times improvement in prior preservation.
arXiv Detail & Related papers (2024-12-09T01:56:25Z) - Reliable and Efficient Concept Erasure of Text-to-Image Diffusion Models [76.39651111467832]
We introduce Reliable and Efficient Concept Erasure (RECE), a novel approach that modifies the model in 3 seconds without necessitating additional fine-tuning.
To mitigate inappropriate content potentially represented by derived embeddings, RECE aligns them with harmless concepts in cross-attention layers.
The derivation and erasure of new representation embeddings are conducted iteratively to achieve a thorough erasure of inappropriate concepts.
arXiv Detail & Related papers (2024-07-17T08:04:28Z) - Mutual-modality Adversarial Attack with Semantic Perturbation [81.66172089175346]
We propose a novel approach that generates adversarial attacks in a mutual-modality optimization scheme.
Our approach outperforms state-of-the-art attack methods and can be readily deployed as a plug-and-play solution.
arXiv Detail & Related papers (2023-12-20T05:06:01Z) - LEAT: Towards Robust Deepfake Disruption in Real-World Scenarios via
Latent Ensemble Attack [11.764601181046496]
Deepfakes, malicious visual contents created by generative models, pose an increasingly harmful threat to society.
To proactively mitigate deepfake damages, recent studies have employed adversarial perturbation to disrupt deepfake model outputs.
We propose a simple yet effective disruption method called Latent Ensemble ATtack (LEAT), which attacks the independent latent encoding process.
arXiv Detail & Related papers (2023-07-04T07:00:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.