Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models
- URL: http://arxiv.org/abs/2510.20468v1
- Date: Thu, 23 Oct 2025 12:06:35 GMT
- Title: Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models
- Authors: Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko,
- Abstract summary: We investigate watermark forging in the context of widely used post-hoc image watermarking.<n>We introduce a preference model to assess whether an image is watermarked.<n>We demonstrate the model's capability to remove and forge watermarks by optimizing the input image through backpropagation.
- Score: 42.902365202924535
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have seen a surge in interest in digital content watermarking techniques, driven by the proliferation of generative models and increased legal pressure. With an ever-growing percentage of AI-generated content available online, watermarking plays an increasingly important role in ensuring content authenticity and attribution at scale. There have been many works assessing the robustness of watermarking to removal attacks, yet, watermark forging, the scenario when a watermark is stolen from genuine content and applied to malicious content, remains underexplored. In this work, we investigate watermark forging in the context of widely used post-hoc image watermarking. Our contributions are as follows. First, we introduce a preference model to assess whether an image is watermarked. The model is trained using a ranking loss on purely procedurally generated images without any need for real watermarks. Second, we demonstrate the model's capability to remove and forge watermarks by optimizing the input image through backpropagation. This technique requires only a single watermarked image and works without knowledge of the watermarking model, making our attack much simpler and more practical than attacks introduced in related work. Third, we evaluate our proposed method on a variety of post-hoc image watermarking models, demonstrating that our approach can effectively forge watermarks, questioning the security of current watermarking approaches. Our code and further resources are publicly available.
Related papers
- Diffusion-Based Image Editing for Breaking Robust Watermarks [4.273350357872755]
Powerful diffusion-based image generation and editing techniques pose a new threat to robust watermarking schemes.<n>We show that a diffusion-driven image regeneration'' process can erase embedded watermarks while preserving image content.<n>We introduce a novel guided diffusion attack that explicitly targets the watermark signal during generation, significantly degrading watermark detectability.
arXiv Detail & Related papers (2025-10-07T14:34:42Z) - Certifiably Robust Image Watermark [57.546016845801134]
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns.
Watermarking AI-generated content is a key technology to address these concerns.
We propose the first image watermarks with certified robustness guarantees against removal and forgery attacks.
arXiv Detail & Related papers (2024-07-04T17:56:04Z) - Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious? [21.06493827123594]
steganalysis attacks can extract and remove the watermark with minimal perceptual distortion.
We show how averaging a collection of watermarked images could reveal the underlying watermark pattern.
We propose security guidelines calling for using content-adaptive watermarking strategies and performing security evaluation against steganalysis.
arXiv Detail & Related papers (2024-06-13T12:01:28Z) - Robustness of AI-Image Detectors: Fundamental Limits and Practical
Attacks [47.04650443491879]
We analyze the robustness of various AI-image detectors including watermarking and deepfake detectors.
We show that watermarking methods are vulnerable to spoofing attacks where the attacker aims to have real images identified as watermarked ones.
arXiv Detail & Related papers (2023-09-29T18:30:29Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - Invisible Image Watermarks Are Provably Removable Using Generative AI [47.25747266531665]
Invisible watermarks safeguard images' copyrights by embedding hidden messages only detectable by owners.
We propose a family of regeneration attacks to remove these invisible watermarks.
The proposed attack method first adds random noise to an image to destroy the watermark and then reconstructs the image.
arXiv Detail & Related papers (2023-06-02T23:29:28Z) - Certified Neural Network Watermarks with Randomized Smoothing [64.86178395240469]
We propose a certifiable watermarking method for deep learning models.
We show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold.
Our watermark is also empirically more robust compared to previous watermarking methods.
arXiv Detail & Related papers (2022-07-16T16:06:59Z) - Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal
Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective.
We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations.
Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.