Removal Attack and Defense on AI-generated Content Latent-based Watermarking
- URL: http://arxiv.org/abs/2509.11745v2
- Date: Wed, 17 Sep 2025 05:51:54 GMT
- Title: Removal Attack and Defense on AI-generated Content Latent-based Watermarking
- Authors: De Zhang Lee, Han Fang, Hanyi Wang, Ee-Chien Chang,
- Abstract summary: Digital watermarks can be embedded into AI-generated content (AIGC) by initializing the generation process with starting points sampled from a secret distribution.<n>When combined with pseudorandom error-correcting codes, such watermarked outputs can remain indistinguishable from unwatermarked objects, while maintaining robustness under whitenoise.<n>We propose a novel attack that exploits boundary information leaked by the locations of watermarked objects.<n>This attack significantly reduces the distortion required to remove watermarks by up to a factor of $15 times$ compared to a baseline whitenoise attack under certain settings.
- Score: 26.09708301315328
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Digital watermarks can be embedded into AI-generated content (AIGC) by initializing the generation process with starting points sampled from a secret distribution. When combined with pseudorandom error-correcting codes, such watermarked outputs can remain indistinguishable from unwatermarked objects, while maintaining robustness under whitenoise. In this paper, we go beyond indistinguishability and investigate security under removal attacks. We demonstrate that indistinguishability alone does not necessarily guarantee resistance to adversarial removal. Specifically, we propose a novel attack that exploits boundary information leaked by the locations of watermarked objects. This attack significantly reduces the distortion required to remove watermarks -- by up to a factor of $15 \times$ compared to a baseline whitenoise attack under certain settings. To mitigate such attacks, we introduce a defense mechanism that applies a secret transformation to hide the boundary, and prove that the secret transformation effectively rendering any attacker's perturbations equivalent to those of a naive whitenoise adversary. Our empirical evaluations, conducted on multiple versions of Stable Diffusion, validate the effectiveness of both the attack and the proposed defense, highlighting the importance of addressing boundary leakage in latent-based watermarking schemes.
Related papers
- SemBind: Binding Diffusion Watermarks to Semantics Against Black-Box Forgery Attacks [74.76909939060833]
Black-box forgery attacks pose outsized risk to provenance and trust.<n>We propose SemBind, a framework for latent-based watermarks that resists black-box forgery.<n>We show that SemBind-enabled anti-forgery variants markedly reduce false acceptance under black-box forgery.
arXiv Detail & Related papers (2026-01-28T07:02:40Z) - Character-Level Perturbations Disrupt LLM Watermarks [64.60090923837701]
We formalize the system model for Large Language Model (LLM) watermarking.<n>We characterize two realistic threat models constrained on limited access to the watermark detector.<n>We demonstrate character-level perturbations are significantly more effective for watermark removal under the most restrictive threat model.<n> Experiments confirm the superiority of character-level perturbations and the effectiveness of the Genetic Algorithm (GA) in removing watermarks under realistic constraints.
arXiv Detail & Related papers (2025-09-11T02:50:07Z) - Mitigating Watermark Forgery in Generative Models via Multi-Key Watermarking [9.928222896746249]
A security threat to GenAI providers are emphforgery attacks, where malicious users insert the provider's watermark into generated content.<n>One potential defense to resist forgery is using multiple keys to watermark generated content.<n>We propose an improved multi-key watermarking method that resists all surveyed forgery attacks.
arXiv Detail & Related papers (2025-07-10T15:52:32Z) - When There Is No Decoder: Removing Watermarks from Stable Diffusion Models in a No-box Setting [37.85082375268253]
We study the robustness of model-specific watermarking, where watermark embedding is integrated with text-to-image generation.<n>We introduce three attack strategies: edge prediction-based, box blurring, and fine-tuning-based attacks in a no-box setting.<n>Our best-performing attack achieves a reduction in watermark detection accuracy to approximately 47.92%.
arXiv Detail & Related papers (2025-07-04T15:22:20Z) - Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models [16.57738116313139]
We show that attackers can leverage unrelated models, even with different latent spaces and architectures, to perform powerful and realistic forgery attacks.<n>The first imprints a targeted watermark into real images by manipulating the latent representation of an arbitrary image in an unrelated LDM.<n>The second attack generates new images with the target watermark by inverting a watermarked image and re-generating it with an arbitrary prompt.
arXiv Detail & Related papers (2024-12-04T12:57:17Z) - Robustness of Watermarking on Text-to-Image Diffusion Models [9.277492743469235]
We investigate the robustness of generative watermarking, which is created from the integration of watermarking embedding and text-to-image generation processing.
We found that generative watermarking methods are robust to direct evasion attacks, like discriminator-based attacks, or manipulation based on the edge information in edge prediction-based attacks but vulnerable to malicious fine-tuning.
arXiv Detail & Related papers (2024-08-04T13:59:09Z) - Certifiably Robust Image Watermark [57.546016845801134]
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns.
Watermarking AI-generated content is a key technology to address these concerns.
We propose the first image watermarks with certified robustness guarantees against removal and forgery attacks.
arXiv Detail & Related papers (2024-07-04T17:56:04Z) - Invisible Image Watermarks Are Provably Removable Using Generative AI [47.25747266531665]
Invisible watermarks safeguard images' copyrights by embedding hidden messages only detectable by owners.
We propose a family of regeneration attacks to remove these invisible watermarks.
The proposed attack method first adds random noise to an image to destroy the watermark and then reconstructs the image.
arXiv Detail & Related papers (2023-06-02T23:29:28Z) - Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack.
We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z) - Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal
Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective.
We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations.
Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.