Related papers: FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing

FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing

URL: http://arxiv.org/abs/2405.17472v1
Date: Fri, 24 May 2024 03:23:51 GMT
Title: FreezeAsGuard: Mitigating Illegal Adaptation of Diffusion Models via Selective Tensor Freezing
Authors: Kai Huang, Wei Gao,
Abstract summary: We present FreezeAsGuard, a technique that enables irreversible mitigation of illegal adaptations of diffusion models. The basic approach is that the model publisher selectively freezes tensors in pre-trained diffusion models critical to illegal model adaptations. Experiment results show that FreezeAsGuard provides stronger power in mitigating illegal model adaptations of generating fake public figures' portraits.
Score: 10.557086968942498
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Text-to-image diffusion models can be fine-tuned in custom domains to adapt to specific user preferences, but such unconstrained adaptability has also been utilized for illegal purposes, such as forging public figures' portraits and duplicating copyrighted artworks. Most existing work focuses on detecting the illegally generated contents, but cannot prevent or mitigate illegal adaptations of diffusion models. Other schemes of model unlearning and reinitialization, similarly, cannot prevent users from relearning the knowledge of illegal model adaptation with custom data. In this paper, we present FreezeAsGuard, a new technique that addresses these limitations and enables irreversible mitigation of illegal adaptations of diffusion models. The basic approach is that the model publisher selectively freezes tensors in pre-trained diffusion models that are critical to illegal model adaptations, to mitigate the fine-tuned model's representation power in illegal domains but minimize the impact on legal model adaptations in other domains. Such tensor freezing can be enforced via APIs provided by the model publisher for fine-tuning, can motivate users' adoption due to its computational savings. Experiment results with datasets in multiple domains show that FreezeAsGuard provides stronger power in mitigating illegal model adaptations of generating fake public figures' portraits, while having the minimum impact on model adaptation in other legal domains. The source code is available at: https://github.com/pittisl/FreezeAsGuard/

Related papers

Embedding Hidden Adversarial Capabilities in Pre-Trained Diffusion Models [1.534667887016089]
We introduce a new attack paradigm that embeds hidden adversarial capabilities directly into diffusion models via fine-tuning. The resulting tampered model generates high-quality images indistinguishable from those of the original. We demonstrate the effectiveness and stealthiness of our approach, uncovering a covert attack vector that raises new security concerns.
arXiv Detail & Related papers (2025-04-05T12:51:36Z)
Tracking the Copyright of Large Vision-Language Models through Parameter Learning Adversarial Images [9.351260848685229]
Large vision-language models (LVLMs) have demonstrated remarkable image understanding and dialogue capabilities. Their widespread availability raises concerns about unauthorized usage and copyright infringement. We propose a novel method called Learning Attack (PLA) for tracking the copyright of LVLMs without modifying the original model.
arXiv Detail & Related papers (2025-02-23T14:49:34Z)
SleeperMark: Towards Robust Watermark against Fine-Tuning Text-to-image Diffusion Models [77.80595722480074]
SleeperMark is a framework designed to embed resilient watermarks into T2I diffusion models. It guides the model to disentangle the watermark information from the semantic concepts it learns. Our experiments demonstrate the effectiveness of SleeperMark across various types of diffusion models.
arXiv Detail & Related papers (2024-12-06T08:44:18Z)
Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models [57.16056181201623]
Fine-tuning text-to-image diffusion models can inadvertently undo safety measures, causing models to relearn harmful concepts. We present a novel but immediate solution called Modular LoRA, which involves training Safety Low-Rank Adaptation modules separately from Fine-Tuning LoRA components. This method effectively prevents the re-learning of harmful content without compromising the model's performance on new tasks.
arXiv Detail & Related papers (2024-11-30T04:37:38Z)
Model Integrity when Unlearning with T2I Diffusion Models [11.321968363411145]
We propose approximate Machine Unlearning algorithms to reduce the generation of specific types of images, characterized by samples from a forget distribution'' We then propose unlearning algorithms that demonstrate superior effectiveness in preserving model integrity compared to existing baselines.
arXiv Detail & Related papers (2024-11-04T13:15:28Z)
Risks When Sharing LoRA Fine-Tuned Diffusion Model Weights [0.10878040851638002]
We study the issue of privacy leakage of a fine-tuned diffusion model in a practical setting. An adversary can generate images containing the same identities as the private images.
arXiv Detail & Related papers (2024-09-13T02:13:26Z)
Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models [71.13610023354967]
Copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. We propose a diffusion model watermarking technique that is both performance-lossless and training-free.
arXiv Detail & Related papers (2024-04-07T13:30:10Z)
CGI-DM: Digital Copyright Authentication for Diffusion Models via Contrasting Gradient Inversion [26.70115339710056]
Contrasting Gradient Inversion for Diffusion Models (CGI-DM) is a novel method featuring vivid visual representations for digital copyright authentication. We formulate the differences as KL divergence between latent variables of the two models when given the same input image. Experiments on the WikiArt and Dreambooth datasets demonstrate the high accuracy of CGI-DM in digital copyright authentication.
arXiv Detail & Related papers (2024-03-17T10:06:38Z)
Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation [56.79064699832383]
We establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud.
arXiv Detail & Related papers (2024-02-27T08:47:19Z)
IMMA: Immunizing text-to-image Models against Malicious Adaptation [11.912092139018885]
Open-sourced text-to-image models and fine-tuning methods have led to the increasing risk of malicious adaptation, i.e., fine-tuning to generate harmful/unauthorized content. We propose to immunize'' the model by learning model parameters that are difficult for the adaptation methods when fine-tuning malicious content; in short IMMA. Empirical results show IMMA's effectiveness against malicious adaptations, including mimicking the artistic style and learning of inappropriate/unauthorized content.
arXiv Detail & Related papers (2023-11-30T18:55:16Z)
Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback [97.0874638345205]
generative models can be great test-time adapters for discriminative models. Our method, Diffusion-TTA, adapts pre-trained discriminative models to each unlabelled example in the test set. We show Diffusion-TTA significantly enhances the accuracy of various large-scale pre-trained discriminative models.
arXiv Detail & Related papers (2023-11-27T18:59:53Z)
DIAGNOSIS: Detecting Unauthorized Data Usages in Text-to-image Diffusion Models [79.71665540122498]
We propose a method for detecting unauthorized data usage by planting the injected content into the protected dataset. Specifically, we modify the protected images by adding unique contents on these images using stealthy image warping functions. By analyzing whether the model has memorized the injected content, we can detect models that had illegally utilized the unauthorized data.
arXiv Detail & Related papers (2023-07-06T16:27:39Z)
Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives. We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z)
AdaptGuard: Defending Against Universal Attacks for Model Adaptation [129.2012687550069]
We study the vulnerability to universal attacks transferred from the source domain during model adaptation algorithms. We propose a model preprocessing framework, named AdaptGuard, to improve the security of model adaptation algorithms.
arXiv Detail & Related papers (2023-03-19T07:53:31Z)
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models [60.63556257324894]
A key desired property of image generative models is the ability to disentangle different attributes. We propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation. Experiments show that the proposed method can modify a wide range of attributes, with the performance outperforming diffusion-model-based image-editing algorithms.
arXiv Detail & Related papers (2022-12-16T19:58:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.