Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion
- URL: http://arxiv.org/abs/2603.00140v1
- Date: Tue, 24 Feb 2026 09:07:08 GMT
- Title: Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion
- Authors: Sathwik Karnik, Juyeop Kim, Sanmi Koyejo, Jong-Seok Lee, Somil Bansal,
- Abstract summary: Current mitigation strategies typically sacrifice image quality or prompt alignment to reduce memorization.<n>We propose Reachability-Aware Diffusion Steering (RADS), an inference-time framework that prevents memorization while preserving generation fidelity.<n>RADS models the diffusion denoising process as a dynamical system and applies concepts from reachability analysis to approximate the "backward reachable tube"
- Score: 44.47036589940717
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image diffusion models often memorize training data, revealing a fundamental failure to generalize beyond the training set. Current mitigation strategies typically sacrifice image quality or prompt alignment to reduce memorization. To address this, we propose Reachability-Aware Diffusion Steering (RADS), an inference-time framework that prevents memorization while preserving generation fidelity. RADS models the diffusion denoising process as a dynamical system and applies concepts from reachability analysis to approximate the "backward reachable tube"--the set of intermediate states that inevitably evolve into memorized samples. We then formulate mitigation as a constrained reinforcement learning (RL) problem, where a policy learns to steer the trajectory away from memorization via minimal perturbations in the caption embedding space. Empirical evaluations show that RADS achieves a superior Pareto frontier between generation diversity (SSCD), quality (FID), and alignment (CLIP) compared to state-of-the-art baselines. Crucially, RADS provides robust mitigation without modifying the diffusion backbone, offering a plug-and-play solution for safe generation. Our website is available at: https://s-karnik.github.io/rads-memorization-project-page/.
Related papers
- You Don't Need All That Attention: Surgical Memorization Mitigation in Text-to-Image Diffusion Models [8.429432661292964]
Generative models have been shown to "memorize" certain training data, leading to verbatim or near-verbatim generating images.<n>We introduce Guidance Using Attractive-Repulsive Dynamics (GUARD), a novel framework for memorization mitigation in text-to-image diffusion models.<n>GUARD adjusts the image denoising process to guide the generation away from an original training image and towards one that is distinct from training data.
arXiv Detail & Related papers (2026-02-23T17:20:40Z) - Rethinking Multi-Condition DiTs: Eliminating Redundant Attention via Position-Alignment and Keyword-Scoping [61.459927600301654]
Multi-condition control is bottlenecked by the conventional concatenate-and-attend'' strategy.<n>Our analysis reveals that much of this cross-modal interaction is spatially or semantically redundant.<n>We propose Position-aligned and Keyword-scoped Attention (PKA), a highly efficient framework designed to eliminate these redundancies.
arXiv Detail & Related papers (2026-02-06T16:39:10Z) - The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization [51.835894707552946]
Unlearning-based defenses claim to purge Not-Safe-For-Work concepts from diffusion models (DMs)<n>We show that unlearning partially disrupts the mapping between linguistic symbols and the underlying knowledge, which remains intact as dormant memories.<n>We propose IVO, a concise and powerful attack framework that reactivates these dormant memories by reconstructing the broken mappings.
arXiv Detail & Related papers (2026-01-30T02:39:51Z) - Deep Leakage with Generative Flow Matching Denoiser [54.05993847488204]
We introduce a new deep leakage (DL) attack that integrates a generative Flow Matching (FM) prior into the reconstruction process.<n>Our approach consistently outperforms state-of-the-art attacks across pixel-level, perceptual, and feature-based similarity metrics.
arXiv Detail & Related papers (2026-01-21T14:51:01Z) - SafeRedir: Prompt Embedding Redirection for Robust Unlearning in Image Generation Models [67.84174763413178]
We introduce SafeRedir, a lightweight inference-time framework for robust unlearning via prompt embedding redirection.<n>We show that SafeRedir achieves effective unlearning capability, high semantic and perceptual preservation, robust image quality, and enhanced resistance to adversarial attacks.
arXiv Detail & Related papers (2026-01-13T15:01:38Z) - Critic-Guided Reinforcement Unlearning in Text-to-Image Diffusion [0.0]
Machine unlearning in text-to-image diffusion models aims to remove targeted concepts while preserving overall utility.<n>We present a general RL framework for diffusion unlearning that treats denoising as a sequential decision process.<n>Our algorithm is simple to implement, supports off-policy reuse, and plugs into standard text-to-image backbones.
arXiv Detail & Related papers (2026-01-06T17:52:02Z) - CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models [60.610268549138375]
Diffusion models can unintentionally reproduce training examples, raising privacy and copyright concerns.<n>We introduce CAPTAIN, a training-free framework that mitigates memorization by directly modifying latent features during denoising.
arXiv Detail & Related papers (2025-12-11T14:01:47Z) - Redistribute Ensemble Training for Mitigating Memorization in Diffusion Models [31.92526915009259]
Diffusion models are known for their tremendous ability to generate high-quality samples.<n>Recent methods for memory mitigation have primarily addressed the issue within the context of the text modality.<n>We propose a novel method for diffusion models from the perspective of visual modality, which is more generic and fundamental for mitigating memorization.
arXiv Detail & Related papers (2025-02-13T15:56:44Z) - LoyalDiffusion: A Diffusion Model Guarding Against Data Replication [6.818344768093927]
Diffusion models can replicate training data, particularly when the training data includes confidential information.<n>We propose a replication-aware U-Net architecture that incorporates information transfer blocks into skip connections that are less essential for image quality.<n>Experiments demonstrate that LoyalDiffusion outperforms the state-of-the-art replication mitigation method achieving a 48.63% reduction in replication while maintaining comparable image quality.
arXiv Detail & Related papers (2024-12-02T04:41:30Z) - Detecting, Explaining, and Mitigating Memorization in Diffusion Models [49.438362005962375]
We introduce a straightforward yet effective method for detecting memorized prompts by inspecting the magnitude of text-conditional predictions.
Our proposed method seamlessly integrates without disrupting sampling algorithms, and delivers high accuracy even at the first generation step.
Building on our detection strategy, we unveil an explainable approach that shows the contribution of individual words or tokens to memorization.
arXiv Detail & Related papers (2024-07-31T16:13:29Z) - Iterative Ensemble Training with Anti-Gradient Control for Mitigating Memorization in Diffusion Models [20.550324116099357]
Diffusion models are known for their tremendous ability to generate novel and high-quality samples.<n>Recent approaches for memory mitigation either only focused on the text modality problem in cross-modal generation tasks or utilized data augmentation strategies.<n>We propose a novel training framework for diffusion models from the perspective of visual modality, which is more generic and fundamental for mitigating memorization.
arXiv Detail & Related papers (2024-07-22T02:19:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.