Related papers: Rethinking and Defending Protective Perturbation in Personalized Diffusion Models

Rethinking and Defending Protective Perturbation in Personalized Diffusion Models

URL: http://arxiv.org/abs/2406.18944v4
Date: Thu, 03 Oct 2024 03:35:53 GMT
Title: Rethinking and Defending Protective Perturbation in Personalized Diffusion Models
Authors: Yixin Liu, Ruoxi Chen, Xun Chen, Lichao Sun,
Abstract summary: We study the fine-tuning process of personalized diffusion models (PDMs) through the lens of shortcut learning. PDMs are susceptible to minor adversarial perturbations, leading to significant degradation when fine-tuned on corrupted datasets. We propose a systematic defense framework that includes data purification and contrastive decoupling learning.
Score: 21.30373461975769
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized diffusion models (PDMs) have become prominent for adapting pretrained text-to-image models to generate images of specific subjects using minimal training data. However, PDMs are susceptible to minor adversarial perturbations, leading to significant degradation when fine-tuned on corrupted datasets. These vulnerabilities are exploited to create protective perturbations that prevent unauthorized image generation. Existing purification methods attempt to mitigate this issue but often over-purify images, resulting in information loss. In this work, we conduct an in-depth analysis of the fine-tuning process of PDMs through the lens of shortcut learning. We hypothesize and empirically demonstrate that adversarial perturbations induce a latent-space misalignment between images and their text prompts in the CLIP embedding space. This misalignment causes the model to erroneously associate noisy patterns with unique identifiers during fine-tuning, resulting in poor generalization. Based on these insights, we propose a systematic defense framework that includes data purification and contrastive decoupling learning. We first employ off-the-shelf image restoration techniques to realign images with their original semantic meanings in latent space. Then, we introduce contrastive decoupling learning with noise tokens to decouple the learning of personalized concepts from spurious noise patterns. Our study not only uncovers fundamental shortcut learning vulnerabilities in PDMs but also provides a comprehensive evaluation framework for developing stronger protection. Our extensive evaluation demonstrates its superiority over existing purification methods and stronger robustness against adaptive perturbation.

Related papers

Active Adversarial Noise Suppression for Image Forgery Localization [56.98050814363447]
We introduce an Adversarial Noise Suppression Module (ANSM) that generate a defensive perturbation to suppress the attack effect of adversarial noise.<n>To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks.
arXiv Detail & Related papers (2025-06-15T14:53:27Z)
Protective Perturbations against Unauthorized Data Usage in Diffusion-based Image Generation [15.363134355805764]
Diffusion-based text-to-image models have shown immense potential for various image-related tasks. customizing these models using unauthorized data brings serious privacy and intellectual property issues. Existing methods introduce protective perturbations based on adversarial attacks. We present a survey of protective perturbation methods designed to prevent unauthorized data usage in diffusion-based image generation.
arXiv Detail & Related papers (2024-12-25T06:06:41Z)
Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models [57.16056181201623]
Fine-tuning text-to-image diffusion models can inadvertently undo safety measures, causing models to relearn harmful concepts. We present a novel but immediate solution called Modular LoRA, which involves training Safety Low-Rank Adaptation modules separately from Fine-Tuning LoRA components. This method effectively prevents the re-learning of harmful content without compromising the model's performance on new tasks.
arXiv Detail & Related papers (2024-11-30T04:37:38Z)
Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness [56.2479170374811]
We introduce Fine-Tuning with Confidence-Aware Denoised Image Selection (FT-CADIS) FT-CADIS is inspired by the observation that the confidence of off-the-shelf classifiers can effectively identify hallucinated images during denoised smoothing. It has established the state-of-the-art certified robustness among denoised smoothing methods across all $ell$-adversary radius in various benchmarks.
arXiv Detail & Related papers (2024-11-13T09:13:20Z)
A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse [9.777410374242972]
Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have revolutionized image synthesis and manipulation. We propose the Posterior Collapse Attack (PCA) based on the observation that VAEs suffer from posterior collapse during training. Our method minimizes dependence on the white-box information of target models to get rid of the implicit reliance on model-specific knowledge.
arXiv Detail & Related papers (2024-08-20T14:43:53Z)
DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models [18.938687631109925]
Diffusion-based personalized visual content generation technologies have achieved significant breakthroughs. However, when misused to fabricate fake news or unsettling content targeting individuals, these technologies could cause considerable societal harm. This paper introduces a novel Dual-Domain Anti-Personalization framework (DDAP) By alternating between these two methods, we construct the DDAP framework, effectively harnessing the strengths of both domains.
arXiv Detail & Related papers (2024-07-29T16:11:21Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration [64.84134880709625]
We show that it is possible to perform domain adaptation via the noise space using diffusion models. In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss. We present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model.
arXiv Detail & Related papers (2024-06-26T17:40:30Z)
Semantic Deep Hiding for Robust Unlearnable Examples [33.68037533119807]
Unlearnable examples are proposed to mislead the deep learning models and prevent data from unauthorized exploration. We propose a Deep Hiding scheme that adaptively hides semantic images enriched with high-level features. Our proposed method exhibits outstanding robustness for unlearnable examples, demonstrating its efficacy in preventing unauthorized data exploitation.
arXiv Detail & Related papers (2024-06-25T08:05:42Z)
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery. It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z)
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting [133.55037976429088]
We investigate the adversarial robustness of vision transformers equipped with BERT pretraining (e.g., BEiT, MAE) A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods. We propose a simple yet effective way to boost the adversarial robustness of MAE.
arXiv Detail & Related papers (2023-08-20T16:27:17Z)
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z)
Unlearnable Examples Give a False Sense of Data Privacy: Understanding and Relearning [31.2971146235291]
Unlearnable examples generate unlearnable examples by adding imperceptible perturbations to public publishing data. We propose Progressive Staged Training, a self-adaptive training framework specially designed to break unlearnable examples. Our method circumvents the unlearnability of all state-of-the-art methods in the literature.
arXiv Detail & Related papers (2023-06-03T09:36:16Z)
Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation [25.55296442023984]
We propose a method, Unlearnable Diffusion Perturbation, to safeguard images from unauthorized exploitation. This achievement holds significant importance in real-world scenarios, as it contributes to the protection of privacy and copyright against AI-generated content.
arXiv Detail & Related papers (2023-06-02T20:19:19Z)
Minimum Noticeable Difference based Adversarial Privacy Preserving Image Generation [44.2692621807947]
We develop a framework to generate adversarial privacy preserving images that have minimum perceptual difference from the clean ones but are able to attack deep learning models. To the best of our knowledge, this is the first work on exploring quality-preserving adversarial image generation based on the MND concept for privacy preserving.
arXiv Detail & Related papers (2022-06-17T09:02:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.