GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors
- URL: http://arxiv.org/abs/2503.03944v1
- Date: Wed, 05 Mar 2025 22:21:44 GMT
- Title: GuardDoor: Safeguarding Against Malicious Diffusion Editing via Protective Backdoors
- Authors: Yaopei Zeng, Yuanpu Cao, Lu Lin,
- Abstract summary: GuardDoor is a novel and robust protection mechanism that fosters collaboration between image owners and model providers.<n>Our method demonstrates enhanced robustness against image preprocessing operations and is scalable for large-scale deployment.
- Score: 8.261182037130407
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The growing accessibility of diffusion models has revolutionized image editing but also raised significant concerns about unauthorized modifications, such as misinformation and plagiarism. Existing countermeasures largely rely on adversarial perturbations designed to disrupt diffusion model outputs. However, these approaches are found to be easily neutralized by simple image preprocessing techniques, such as compression and noise addition. To address this limitation, we propose GuardDoor, a novel and robust protection mechanism that fosters collaboration between image owners and model providers. Specifically, the model provider participating in the mechanism fine-tunes the image encoder to embed a protective backdoor, allowing image owners to request the attachment of imperceptible triggers to their images. When unauthorized users attempt to edit these protected images with this diffusion model, the model produces meaningless outputs, reducing the risk of malicious image editing. Our method demonstrates enhanced robustness against image preprocessing operations and is scalable for large-scale deployment. This work underscores the potential of cooperative frameworks between model providers and image owners to safeguard digital content in the era of generative AI.
Related papers
- DCT-Shield: A Robust Frequency Domain Defense against Malicious Image Editing [1.7624347338410742]
Recent defenses attempt to protect images by adding a limited noise in the pixel space to disrupt the functioning of diffusion-based editing models.
We propose a novel optimization approach that introduces adversarial perturbations directly in the frequency domain.
By leveraging the JPEG pipeline, our method generates adversarial images that effectively prevent malicious image editing.
arXiv Detail & Related papers (2025-04-24T19:14:50Z) - PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models [51.458089902581456]
We introduce PersGuard, a novel backdoor-based approach that prevents malicious personalization of specific images.<n>Our method significantly outperforms existing techniques, offering a more robust solution for privacy and copyright protection.
arXiv Detail & Related papers (2025-02-22T09:47:55Z) - Safety Alignment Backfires: Preventing the Re-emergence of Suppressed Concepts in Fine-tuned Text-to-Image Diffusion Models [57.16056181201623]
Fine-tuning text-to-image diffusion models can inadvertently undo safety measures, causing models to relearn harmful concepts.<n>We present a novel but immediate solution called Modular LoRA, which involves training Safety Low-Rank Adaptation modules separately from Fine-Tuning LoRA components.<n>This method effectively prevents the re-learning of harmful content without compromising the model's performance on new tasks.
arXiv Detail & Related papers (2024-11-30T04:37:38Z) - DiffusionGuard: A Robust Defense Against Malicious Diffusion-based Image Editing [93.45507533317405]
DiffusionGuard is a robust and effective defense method against unauthorized edits by diffusion-based image editing models.
We introduce a novel objective that generates adversarial noise targeting the early stage of the diffusion process.
We also introduce a mask-augmentation technique to enhance robustness against various masks during test time.
arXiv Detail & Related papers (2024-10-08T05:19:19Z) - Pixel Is Not a Barrier: An Effective Evasion Attack for Pixel-Domain Diffusion Models [9.905296922309157]
Diffusion Models have emerged as powerful generative models for high-quality image synthesis, with many subsequent image editing techniques based on them.<n>Previous works have attempted to safeguard images from diffusion-based editing by adding imperceptible perturbations.<n>Our work proposes a novel attack framework, AtkPDM, which exploits vulnerabilities in denoising UNets and a latent optimization strategy to enhance the naturalness of adversarial images.
arXiv Detail & Related papers (2024-08-21T17:56:34Z) - IMPRESS: Evaluating the Resilience of Imperceptible Perturbations
Against Unauthorized Data Usage in Diffusion-Based Generative AI [52.90082445349903]
Diffusion-based image generation models can create artistic images that mimic the style of an artist or maliciously edit the original images for fake content.
Several attempts have been made to protect the original images from such unauthorized data usage by adding imperceptible perturbations.
In this work, we introduce a purification perturbation platform, named IMPRESS, to evaluate the effectiveness of imperceptible perturbations as a protective measure.
arXiv Detail & Related papers (2023-10-30T03:33:41Z) - JPEG Compressed Images Can Bypass Protections Against AI Editing [48.340067730457584]
Imperceptible perturbations have been proposed as a means of protecting images from malicious editing.
We find that the aforementioned perturbations are not robust to JPEG compression.
arXiv Detail & Related papers (2023-04-05T05:30:09Z) - Raising the Cost of Malicious AI-Powered Image Editing [82.71990330465115]
We present an approach to mitigating the risks of malicious image editing posed by large diffusion models.
The key idea is to immunize images so as to make them resistant to manipulation by these models.
arXiv Detail & Related papers (2023-02-13T18:38:42Z) - Invertible Image Dataset Protection [23.688878249633508]
We develop a reversible adversarial example generator (RAEG) that introduces slight changes to the images to fool traditional classification models.
RAEG can better protect the data with slight distortion against adversarial defense than previous methods.
arXiv Detail & Related papers (2021-12-29T06:56:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.