Investigating and Defending Shortcut Learning in Personalized Diffusion Models
- URL: http://arxiv.org/abs/2406.18944v2
- Date: Wed, 17 Jul 2024 05:02:01 GMT
- Title: Investigating and Defending Shortcut Learning in Personalized Diffusion Models
- Authors: Yixin Liu, Ruoxi Chen, Lichao Sun,
- Abstract summary: We take a closer look at the fine-tuning process of personalized diffusion models through the lens of shortcut learning.
This misalignment during fine-tuning causes models to associate noisy patterns with identifiers, resulting in performance degradation.
Our method first realigns the images to realign them with their original semantic meanings in latent space.
- Score: 16.569765598914152
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Personalized diffusion models have gained popularity for adapting pre-trained text-to-image models to generate images of specific topics with minimal training data. However, these models are vulnerable to minor adversarial perturbations, leading to degraded performance on corrupted datasets. Such vulnerabilities are further exploited to craft protective perturbations on sensitive images like portraits that prevent unauthorized generation. In response, diffusion-based purification methods have been proposed to remove these perturbations and retain generation performance. However, existing works turn to over-purifying the images, which causes information loss. In this paper, we take a closer look at the fine-tuning process of personalized diffusion models through the lens of shortcut learning. And we propose a hypothesis explaining the manipulation mechanisms of existing perturbation methods, demonstrating that perturbed images significantly deviate from their original prompts in the CLIP-based latent space. This misalignment during fine-tuning causes models to associate noisy patterns with identifiers, resulting in performance degradation. Based on these insights, we introduce a systematic approach to maintain training performance through purification. Our method first purifies the images to realign them with their original semantic meanings in latent space. Then, we introduce contrastive learning with negative tokens to decouple the learning of clean identities from noisy patterns, which shows a strong potential capacity against adaptive perturbation. Our study uncovers shortcut learning vulnerabilities in personalized diffusion models and provides a firm evaluation framework for future protective perturbation research. Code is available at https://github.com/liuyixin-louis/DiffShortcut.
Related papers
- Disrupting Diffusion: Token-Level Attention Erasure Attack against Diffusion-based Customization [19.635385099376066]
malicious users have misused diffusion-based customization methods like DreamBooth to create fake images.
In this paper, we propose DisDiff, a novel adversarial attack method to disrupt the diffusion model outputs.
arXiv Detail & Related papers (2024-05-31T02:45:31Z) - Unveiling and Mitigating Memorization in Text-to-image Diffusion Models through Cross Attention [62.671435607043875]
Research indicates that text-to-image diffusion models replicate images from their training data, raising tremendous concerns about potential copyright infringement and privacy risks.
We reveal that during memorization, the cross-attention tends to focus disproportionately on the embeddings of specific tokens.
We introduce an innovative approach to detect and mitigate memorization in diffusion models.
arXiv Detail & Related papers (2024-03-17T01:27:00Z) - Model Will Tell: Training Membership Inference for Diffusion Models [15.16244745642374]
Training Membership Inference (TMI) task aims to determine whether a specific sample has been used in the training process of a target model.
In this paper, we explore a novel perspective for the TMI task by leveraging the intrinsic generative priors within the diffusion model.
arXiv Detail & Related papers (2024-03-13T12:52:37Z) - Active Generation for Image Classification [50.18107721267218]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Denoising Autoregressive Representation Learning [13.185567468951628]
Our method, DARL, employs a decoder-only Transformer to predict image patches autoregressively.
We show that the learned representation can be improved by using tailored noise schedules and longer training in larger models.
arXiv Detail & Related papers (2024-03-08T10:19:00Z) - Free-ATM: Exploring Unsupervised Learning on Diffusion-Generated Images
with Free Attention Masks [64.67735676127208]
Text-to-image diffusion models have shown great potential for benefiting image recognition.
Although promising, there has been inadequate exploration dedicated to unsupervised learning on diffusion-generated images.
We introduce customized solutions by fully exploiting the aforementioned free attention masks.
arXiv Detail & Related papers (2023-08-13T10:07:46Z) - Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective.
In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives.
We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z) - Diffusion Models for Adversarial Purification [69.1882221038846]
Adrial purification refers to a class of defense methods that remove adversarial perturbations using a generative model.
We propose DiffPure that uses diffusion models for adversarial purification.
Our method achieves the state-of-the-art results, outperforming current adversarial training and adversarial purification methods.
arXiv Detail & Related papers (2022-05-16T06:03:00Z) - Stereopagnosia: Fooling Stereo Networks with Adversarial Perturbations [71.00754846434744]
We show that imperceptible additive perturbations can significantly alter the disparity map.
We show that, when used for adversarial data augmentation, our perturbations result in trained models that are more robust.
arXiv Detail & Related papers (2020-09-21T19:20:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.