CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2512.10655v1
- Date: Thu, 11 Dec 2025 14:01:47 GMT
- Title: CAPTAIN: Semantic Feature Injection for Memorization Mitigation in Text-to-Image Diffusion Models
- Authors: Tong Zhang, Carlos Hinojosa, Bernard Ghanem,
- Abstract summary: Diffusion models can unintentionally reproduce training examples, raising privacy and copyright concerns.<n>We introduce CAPTAIN, a training-free framework that mitigates memorization by directly modifying latent features during denoising.
- Score: 60.610268549138375
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Diffusion models can unintentionally reproduce training examples, raising privacy and copyright concerns as these systems are increasingly deployed at scale. Existing inference-time mitigation methods typically manipulate classifier-free guidance (CFG) or perturb prompt embeddings; however, they often struggle to reduce memorization without compromising alignment with the conditioning prompt. We introduce CAPTAIN, a training-free framework that mitigates memorization by directly modifying latent features during denoising. CAPTAIN first applies frequency-based noise initialization to reduce the tendency to replicate memorized patterns early in the denoising process. It then identifies the optimal denoising timesteps for feature injection and localizes memorized regions. Finally, CAPTAIN injects semantically aligned features from non-memorized reference images into localized latent regions, suppressing memorization while preserving prompt fidelity and visual quality. Our experiments show that CAPTAIN achieves substantial reductions in memorization compared to CFG-based baselines while maintaining strong alignment with the intended prompt.
Related papers
- Look Carefully: Adaptive Visual Reinforcements in Multimodal Large Language Models for Hallucination Mitigation [51.743225614196774]
Multimodal large language models (MLLMs) have achieved remarkable progress in vision-language reasoning.<n>They remain vulnerable to hallucination, where generated content deviates from visual evidence.<n>Recent vision enhancement methods attempt to address this issue by reinforcing visual tokens during decoding.<n>We propose Adaptive Visual Reinforcement (AIR), a training-free framework for MLLMs.
arXiv Detail & Related papers (2026-02-27T14:18:51Z) - Steering Away from Memorization: Reachability-Constrained Reinforcement Learning for Text-to-Image Diffusion [44.47036589940717]
Current mitigation strategies typically sacrifice image quality or prompt alignment to reduce memorization.<n>We propose Reachability-Aware Diffusion Steering (RADS), an inference-time framework that prevents memorization while preserving generation fidelity.<n>RADS models the diffusion denoising process as a dynamical system and applies concepts from reachability analysis to approximate the "backward reachable tube"
arXiv Detail & Related papers (2026-02-24T09:07:08Z) - You Don't Need All That Attention: Surgical Memorization Mitigation in Text-to-Image Diffusion Models [8.429432661292964]
Generative models have been shown to "memorize" certain training data, leading to verbatim or near-verbatim generating images.<n>We introduce Guidance Using Attractive-Repulsive Dynamics (GUARD), a novel framework for memorization mitigation in text-to-image diffusion models.<n>GUARD adjusts the image denoising process to guide the generation away from an original training image and towards one that is distinct from training data.
arXiv Detail & Related papers (2026-02-23T17:20:40Z) - Adjusting Initial Noise to Mitigate Memorization in Text-to-Image Diffusion Models [10.935602641612888]
We show that the initial noise sample plays a crucial role in determining when this escape occurs.<n>We propose two mitigation strategies that adjust the initial noise-either collectively or individually-to find and utilize initial samples that encourage earlier basin escape.
arXiv Detail & Related papers (2025-10-08T10:37:29Z) - How Diffusion Models Memorize [26.711679643772623]
diffusion models can memorize training data, raising serious privacy and copyright concerns.<n>We show memorization is driven by the overestimation of training samples during early denoising.
arXiv Detail & Related papers (2025-09-30T03:03:27Z) - Demystifying Foreground-Background Memorization in Diffusion Models [23.914702151370204]
Diffusion models (DMs) memorize training images and can reproduce near-duplicates during generation.<n>Current detection methods identify verbatim memorization but fail to capture two critical aspects.<n>We propose Foreground Background Memorization (FB-Mem), a novel segmentation-based metric that classifies and quantifies memorized regions within generated images.
arXiv Detail & Related papers (2025-08-16T20:15:16Z) - StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning [79.44594332189018]
Class-Incremental Learning (CIL) seeks to develop models that continuously learn new action categories over time without previously acquired knowledge.<n>Existing approaches either rely on forgetting, raising concerns over memory and privacy, or adapt static image-based methods that neglect temporal modeling.<n>We propose a unified and exemplar-free VCIL framework that explicitly disentangles and preserves information.
arXiv Detail & Related papers (2025-05-20T06:46:51Z) - Exploring Local Memorization in Diffusion Models via Bright Ending Attention [62.979954692036685]
"bright ending" (BE) anomaly in text-to-image diffusion models prone to memorizing training images.<n>We propose a simple yet effective method to integrate BE into existing frameworks.
arXiv Detail & Related papers (2024-10-29T02:16:01Z) - Rethinking and Defending Protective Perturbation in Personalized Diffusion Models [21.30373461975769]
We study the fine-tuning process of personalized diffusion models (PDMs) through the lens of shortcut learning.
PDMs are susceptible to minor adversarial perturbations, leading to significant degradation when fine-tuned on corrupted datasets.
We propose a systematic defense framework that includes data purification and contrastive decoupling learning.
arXiv Detail & Related papers (2024-06-27T07:14:14Z) - Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration [64.84134880709625]
We show that it is possible to perform domain adaptation via the noise space using diffusion models.<n>In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss.<n>We present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model.
arXiv Detail & Related papers (2024-06-26T17:40:30Z) - High-Fidelity Speech Synthesis with Minimal Supervision: All Using
Diffusion Models [56.00939852727501]
Minimally-supervised speech synthesis decouples TTS by combining two types of discrete speech representations.
Non-autoregressive framework enhances controllability, and duration diffusion model enables diversified prosodic expression.
arXiv Detail & Related papers (2023-09-27T09:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.