IMMA: Immunizing text-to-image Models against Malicious Adaptation
- URL: http://arxiv.org/abs/2311.18815v3
- Date: Sat, 28 Sep 2024 02:34:18 GMT
- Title: IMMA: Immunizing text-to-image Models against Malicious Adaptation
- Authors: Amber Yijia Zheng, Raymond A. Yeh,
- Abstract summary: Open-sourced text-to-image models and fine-tuning methods have led to the increasing risk of malicious adaptation, i.e., fine-tuning to generate harmful/unauthorized content.
We propose to immunize'' the model by learning model parameters that are difficult for the adaptation methods when fine-tuning malicious content; in short IMMA.
Empirical results show IMMA's effectiveness against malicious adaptations, including mimicking the artistic style and learning of inappropriate/unauthorized content.
- Score: 11.912092139018885
- License:
- Abstract: Advancements in open-sourced text-to-image models and fine-tuning methods have led to the increasing risk of malicious adaptation, i.e., fine-tuning to generate harmful/unauthorized content. Recent works, e.g., Glaze or MIST, have developed data-poisoning techniques which protect the data against adaptation methods. In this work, we consider an alternative paradigm for protection. We propose to ``immunize'' the model by learning model parameters that are difficult for the adaptation methods when fine-tuning malicious content; in short IMMA. Specifically, IMMA should be applied before the release of the model weights to mitigate these risks. Empirical results show IMMA's effectiveness against malicious adaptations, including mimicking the artistic style and learning of inappropriate/unauthorized content, over three adaptation methods: LoRA, Textual-Inversion, and DreamBooth. The code is available at \url{https://github.com/amberyzheng/IMMA}.
Related papers
- DELMAN: Dynamic Defense Against Large Language Model Jailbreaking with Model Editing [62.43110639295449]
Large Language Models (LLMs) are widely applied in decision making, but their deployment is threatened by jailbreak attacks.
Delman is a novel approach leveraging direct model editing for precise, dynamic protection against jailbreak attacks.
Delman directly updates a minimal set of relevant parameters to neutralize harmful behaviors while preserving the model's utility.
arXiv Detail & Related papers (2025-02-17T10:39:21Z) - Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models [57.16056181201623]
Fine-tuning text-to-image diffusion models can inadvertently undo safety measures, causing models to relearn harmful concepts.
We present a novel but immediate solution called Modular LoRA, which involves training Safety Low-Rank Adaptation modules separately from Fine-Tuning LoRA components.
This method effectively prevents the re-learning of harmful content without compromising the model's performance on new tasks.
arXiv Detail & Related papers (2024-11-30T04:37:38Z) - RLCP: A Reinforcement Learning-based Copyright Protection Method for Text-to-Image Diffusion Model [42.77851688874563]
We propose a Reinforcement Learning-based Copyright Protection(RLCP) method for Text-to-Image Diffusion Model.
Our approach minimizes the generation of copyright-infringing content while maintaining the quality of the model-generated dataset.
arXiv Detail & Related papers (2024-08-29T15:39:33Z) - Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models [9.905296922309157]
Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them.
Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations.
Our work proposes a novel attack framework, AtkPDM, which exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of adversarial images.
arXiv Detail & Related papers (2024-08-21T17:56:34Z) - Covert Malicious Finetuning: Challenges in Safeguarding LLM Adaptation [86.05704141217036]
Black-box finetuning is an emerging interface for adapting state-of-the-art language models to user needs.
We introduce covert malicious finetuning, a method to compromise model safety via finetuning while evading detection.
arXiv Detail & Related papers (2024-06-28T17:05:46Z) - FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing [9.598086319369694]
We present FreezeAsGuard, a technique that enables irreversible mitigation of illegal adaptations of diffusion models.
Experiment results show that FreezeAsGuard provides 37% stronger power in mitigating illegal model adaptations compared to competitive baselines.
arXiv Detail & Related papers (2024-05-24T03:23:51Z) - ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.
adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.
Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z) - A Dataset and Benchmark for Copyright Infringement Unlearning from Text-to-Image Diffusion Models [52.49582606341111]
Copyright law confers creators the exclusive rights to reproduce, distribute, and monetize their creative works.
Recent progress in text-to-image generation has introduced formidable challenges to copyright enforcement.
We introduce a novel pipeline that harmonizes CLIP, ChatGPT, and diffusion models to curate a dataset.
arXiv Detail & Related papers (2024-01-04T11:14:01Z) - Towards Safe Self-Distillation of Internet-Scale Text-to-Image Diffusion
Models [63.20512617502273]
We propose a method called SDD to prevent problematic content generation in text-to-image diffusion models.
Our method eliminates a much greater proportion of harmful content from the generated images without degrading the overall image quality.
arXiv Detail & Related papers (2023-07-12T07:48:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.