Related papers: Investigating and Defending Shortcut Learning in Personalized Diffusion Models

Investigating and Defending Shortcut Learning in Personalized Diffusion Models

URL: http://arxiv.org/abs/2406.18944v2
Date: Wed, 17 Jul 2024 05:02:01 GMT
Title: Investigating and Defending Shortcut Learning in Personalized Diffusion Models
Authors: Yixin Liu, Ruoxi Chen, Lichao Sun,
Abstract summary: We take a closer look at the fine-tuning process of personalized diffusion models through the lens of shortcut learning. This misalignment during fine-tuning causes models to associate noisy patterns with identifiers, resulting in performance degradation. Our method first realigns the images to realign them with their original semantic meanings in latent space.
Score: 16.569765598914152
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Personalized diffusion models have gained popularity for adapting pre-trained text-to-image models to generate images of specific topics with minimal training data. However, these models are vulnerable to minor adversarial perturbations, leading to degraded performance on corrupted datasets. Such vulnerabilities are further exploited to craft protective perturbations on sensitive images like portraits that prevent unauthorized generation. In response, diffusion-based purification methods have been proposed to remove these perturbations and retain generation performance. However, existing works turn to over-purifying the images, which causes information loss. In this paper, we take a closer look at the fine-tuning process of personalized diffusion models through the lens of shortcut learning. And we propose a hypothesis explaining the manipulation mechanisms of existing perturbation methods, demonstrating that perturbed images significantly deviate from their original prompts in the CLIP-based latent space. This misalignment during fine-tuning causes models to associate noisy patterns with identifiers, resulting in performance degradation. Based on these insights, we introduce a systematic approach to maintain training performance through purification. Our method first purifies the images to realign them with their original semantic meanings in latent space. Then, we introduce contrastive learning with negative tokens to decouple the learning of clean identities from noisy patterns, which shows a strong potential capacity against adaptive perturbation. Our study uncovers shortcut learning vulnerabilities in personalized diffusion models and provides a firm evaluation framework for future protective perturbation research. Code is available at https://github.com/liuyixin-louis/DiffShortcut.

Related papers

Active Adversarial Noise Suppression for Image Forgery Localization [56.98050814363447]
We introduce an Adversarial Noise Suppression Module (ANSM) that generate a defensive perturbation to suppress the attack effect of adversarial noise.<n>To our best knowledge, this is the first report of adversarial defense in image forgery localization tasks.
arXiv Detail & Related papers (2025-06-15T14:53:27Z)
Protective Perturbations against Unauthorized Data Usage in Diffusion-based Image Generation [15.363134355805764]
Diffusion-based text-to-image models have shown immense potential for various image-related tasks. customizing these models using unauthorized data brings serious privacy and intellectual property issues. Existing methods introduce protective perturbations based on adversarial attacks. We present a survey of protective perturbation methods designed to prevent unauthorized data usage in diffusion-based image generation.
arXiv Detail & Related papers (2024-12-25T06:06:41Z)
Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models [57.16056181201623]
Fine-tuning text-to-image diffusion models can inadvertently undo safety measures, causing models to relearn harmful concepts. We present a novel but immediate solution called Modular LoRA, which involves training Safety Low-Rank Adaptation modules separately from Fine-Tuning LoRA components. This method effectively prevents the re-learning of harmful content without compromising the model's performance on new tasks.
arXiv Detail & Related papers (2024-11-30T04:37:38Z)
Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness [56.2479170374811]
We introduce Fine-Tuning with Confidence-Aware Denoised Image Selection (FT-CADIS) FT-CADIS is inspired by the observation that the confidence of off-the-shelf classifiers can effectively identify hallucinated images during denoised smoothing. It has established the state-of-the-art certified robustness among denoised smoothing methods across all $ell$-adversary radius in various benchmarks.
arXiv Detail & Related papers (2024-11-13T09:13:20Z)
A Grey-box Attack against Latent Diffusion Model-based Image Editing by Posterior Collapse [9.777410374242972]
Recent advancements in generative AI, particularly Latent Diffusion Models (LDMs), have revolutionized image synthesis and manipulation. We propose the Posterior Collapse Attack (PCA) based on the observation that VAEs suffer from posterior collapse during training. Our method minimizes dependence on the white-box information of target models to get rid of the implicit reliance on model-specific knowledge.
arXiv Detail & Related papers (2024-08-20T14:43:53Z)
DDAP: Dual-Domain Anti-Personalization against Text-to-Image Diffusion Models [18.938687631109925]
Diffusion-based personalized visual content generation technologies have achieved significant breakthroughs. However, when misused to fabricate fake news or unsettling content targeting individuals, these technologies could cause considerable societal harm. This paper introduces a novel Dual-Domain Anti-Personalization framework (DDAP) By alternating between these two methods, we construct the DDAP framework, effectively harnessing the strengths of both domains.
arXiv Detail & Related papers (2024-07-29T16:11:21Z)
Adversarial Robustification via Text-to-Image Diffusion Models [56.37291240867549]
Adrial robustness has been conventionally believed as a challenging property to encode for neural networks. We develop a scalable and model-agnostic solution to achieve adversarial robustness without using any data.
arXiv Detail & Related papers (2024-07-26T10:49:14Z)
Denoising as Adaptation: Noise-Space Domain Adaptation for Image Restoration [64.84134880709625]
We show that it is possible to perform domain adaptation via the noise space using diffusion models. In particular, by leveraging the unique property of how auxiliary conditional inputs influence the multi-step denoising process, we derive a meaningful diffusion loss. We present crucial strategies such as channel-shuffling layer and residual-swapping contrastive learning in the diffusion model.
arXiv Detail & Related papers (2024-06-26T17:40:30Z)
Semantic Deep Hiding for Robust Unlearnable Examples [33.68037533119807]
Unlearnable examples are proposed to mislead the deep learning models and prevent data from unauthorized exploration. We propose a Deep Hiding scheme that adaptively hides semantic images enriched with high-level features. Our proposed method exhibits outstanding robustness for unlearnable examples, demonstrating its efficacy in preventing unauthorized data exploitation.
arXiv Detail & Related papers (2024-06-25T08:05:42Z)
DPMesh: Exploiting Diffusion Prior for Occluded Human Mesh Recovery [71.6345505427213]
DPMesh is an innovative framework for occluded human mesh recovery. It capitalizes on the profound diffusion prior about object structure and spatial relationships embedded in a pre-trained text-to-image diffusion model.
arXiv Detail & Related papers (2024-04-01T18:59:13Z)
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting [133.55037976429088]
We investigate the adversarial robustness of vision transformers equipped with BERT pretraining (e.g., BEiT, MAE) A surprising observation is that MAE has significantly worse adversarial robustness than other BERT pretraining methods. We propose a simple yet effective way to boost the adversarial robustness of MAE.
arXiv Detail & Related papers (2023-08-20T16:27:17Z)
Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition. Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images. We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z)
Unlearnable Examples Give a False Sense of Data Privacy: Understanding and Relearning [31.2971146235291]
Unlearnable examples generate unlearnable examples by adding imperceptible perturbations to public publishing data. We propose Progressive Staged Training, a self-adaptive training framework specially designed to break unlearnable examples. Our method circumvents the unlearnability of all state-of-the-art methods in the literature.
arXiv Detail & Related papers (2023-06-03T09:36:16Z)
Unlearnable Examples for Diffusion Models: Protect Data from Unauthorized Exploitation [25.55296442023984]
We propose a method, Unlearnable Diffusion Perturbation, to safeguard images from unauthorized exploitation. This achievement holds significant importance in real-world scenarios, as it contributes to the protection of privacy and copyright against AI-generated content.
arXiv Detail & Related papers (2023-06-02T20:19:19Z)
Minimum Noticeable Difference based Adversarial Privacy Preserving Image Generation [44.2692621807947]
We develop a framework to generate adversarial privacy preserving images that have minimum perceptual difference from the clean ones but are able to attack deep learning models. To the best of our knowledge, this is the first work on exploring quality-preserving adversarial image generation based on the MND concept for privacy preserving.
arXiv Detail & Related papers (2022-06-17T09:02:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.