Related papers: Robustness of Watermarking on Text-to-Image Diffusion Models

Robustness of Watermarking on Text-to-Image Diffusion Models

URL: http://arxiv.org/abs/2408.02035v2
Date: Mon, 4 Nov 2024 13:37:00 GMT
Title: Robustness of Watermarking on Text-to-Image Diffusion Models
Authors: Xiaodong Wu, Xiangman Li, Jianbing Ni,
Abstract summary: We investigate the robustness of generative watermarking, which is created from the integration of watermarking embedding and text-to-image generation processing. We found that generative watermarking methods are robust to direct evasion attacks, like discriminator-based attacks, or manipulation based on the edge information in edge prediction-based attacks but vulnerable to malicious fine-tuning.
Score: 9.277492743469235
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Watermarking has become one of promising techniques to not only aid in identifying AI-generated images but also serve as a deterrent against the unethical use of these models. However, the robustness of watermarking techniques has not been extensively studied recently. In this paper, we investigate the robustness of generative watermarking, which is created from the integration of watermarking embedding and text-to-image generation processing in generative models, e.g., latent diffusion models. Specifically, we propose three attacking methods, i.e., discriminator-based attacks, edge prediction-based attacks, and fine-tune-based attacks, under the scenario where the watermark decoder is not accessible. The model is allowed to be fine-tuned to created AI agents with specific generative tasks for personalizing or specializing. We found that generative watermarking methods are robust to direct evasion attacks, like discriminator-based attacks, or manipulation based on the edge information in edge prediction-based attacks but vulnerable to malicious fine-tuning. Experimental results show that our fine-tune-based attacks can decrease the accuracy of the watermark detection to nearly $67.92\%$. In addition, We conduct an ablation study on the length of fine-tuned messages, encoder/decoder's depth and structure to identify key factors that impact the performance of fine-tune-based attacks.

Related papers

Removal Attack and Defense on AI-generated Content Latent-based Watermarking [26.09708301315328]
Digital watermarks can be embedded into AI-generated content (AIGC) by initializing the generation process with starting points sampled from a secret distribution.<n>When combined with pseudorandom error-correcting codes, such watermarked outputs can remain indistinguishable from unwatermarked objects, while maintaining robustness under whitenoise.<n>We propose a novel attack that exploits boundary information leaked by the locations of watermarked objects.<n>This attack significantly reduces the distortion required to remove watermarks by up to a factor of $15 times$ compared to a baseline whitenoise attack under certain settings.
arXiv Detail & Related papers (2025-09-15T09:56:24Z)
Character-Level Perturbations Disrupt LLM Watermarks [64.60090923837701]
We formalize the system model for Large Language Model (LLM) watermarking.<n>We characterize two realistic threat models constrained on limited access to the watermark detector.<n>We demonstrate character-level perturbations are significantly more effective for watermark removal under the most restrictive threat model.<n> Experiments confirm the superiority of character-level perturbations and the effectiveness of the Genetic Algorithm (GA) in removing watermarks under realistic constraints.
arXiv Detail & Related papers (2025-09-11T02:50:07Z)
When There Is No Decoder: Removing Watermarks from Stable Diffusion Models in a No-box Setting [37.85082375268253]
We study the robustness of model-specific watermarking, where watermark embedding is integrated with text-to-image generation.<n>We introduce three attack strategies: edge prediction-based, box blurring, and fine-tuning-based attacks in a no-box setting.<n>Our best-performing attack achieves a reduction in watermark detection accuracy to approximately 47.92%.
arXiv Detail & Related papers (2025-07-04T15:22:20Z)
Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models [52.877452505561706]
We propose the first copyright evasion attack specifically designed to undermine dataset ownership verification (DOV)<n>Our CEAT2I comprises three stages: watermarked sample detection, trigger identification, and efficient watermark mitigation.<n>Our experiments show that our CEAT2I effectively evades DOV mechanisms while preserving model performance.
arXiv Detail & Related papers (2025-05-05T17:51:55Z)
SEAL: Semantic Aware Image Watermarking [26.606008778795193]
We propose a novel watermarking method that embeds semantic information about the generated image directly into the watermark. The key pattern can be inferred from the semantic embedding of the image using locality-sensitive hashing. Our results suggest that content-aware watermarks can mitigate risks arising from image-generative models.
arXiv Detail & Related papers (2025-03-15T15:29:05Z)
Certifiably Robust Image Watermark [57.546016845801134]
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns. Watermarking AI-generated content is a key technology to address these concerns. We propose the first image watermarks with certified robustness guarantees against removal and forgery attacks.
arXiv Detail & Related papers (2024-07-04T17:56:04Z)
Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion [15.086451828825398]
evasion adversaries can readily exploit the shortcuts created by models memorizing watermark samples. By learning the model to accurately recognize them, unique watermark behaviors are promoted through knowledge injection.
arXiv Detail & Related papers (2024-04-21T03:38:20Z)
Wide Flat Minimum Watermarking for Robust Ownership Verification of GANs [23.639074918667625]
We propose a novel multi-bit box-free watermarking method for GANs with improved robustness against white-box attacks. The watermark is embedded by adding an extra watermarking loss term during GAN training. We show that the presence of the watermark has a negligible impact on the quality of the generated images.
arXiv Detail & Related papers (2023-10-25T18:38:10Z)
Robustness of AI-Image Detectors: Fundamental Limits and Practical Attacks [47.04650443491879]
We analyze the robustness of various AI-image detectors including watermarking and deepfake detectors. We show that watermarking methods are vulnerable to spoofing attacks where the attacker aims to have real images identified as watermarked ones.
arXiv Detail & Related papers (2023-09-29T18:30:29Z)
Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model. We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z)
Safe and Robust Watermark Injection with a Single OoD Image [90.71804273115585]
Training a high-performance deep neural network requires large amounts of data and computational resources. We propose a safe and robust backdoor-based watermark injection technique. We induce random perturbation of model parameters during watermark injection to defend against common watermark removal attacks.
arXiv Detail & Related papers (2023-09-04T19:58:35Z)
Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack. We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z)
Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective. We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations. Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.