The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis
- URL: http://arxiv.org/abs/2412.02576v1
- Date: Tue, 03 Dec 2024 17:02:49 GMT
- Title: The Efficacy of Transfer-based No-box Attacks on Image Watermarking: A Pragmatic Analysis
- Authors: Qilong Wu, Varun Chandrasekaran,
- Abstract summary: We investigate the robustness of image watermarking methods in the no-box'' setting, where the attacker is assumed to have no knowledge about the watermarking model.
We show that when the configuration is mostly aligned, a simple non-optimization attack can already exceed the success of optimization-based efforts.
- Score: 11.724935807582513
- License:
- Abstract: Watermarking approaches are widely used to identify if images being circulated are authentic or AI-generated. Determining the robustness of image watermarking methods in the ``no-box'' setting, where the attacker is assumed to have no knowledge about the watermarking model, is an interesting problem. Our main finding is that evading the no-box setting is challenging: the success of optimization-based transfer attacks (involving training surrogate models) proposed in prior work~\cite{hu2024transfer} depends on impractical assumptions, including (i) aligning the architecture and training configurations of both the victim and attacker's surrogate watermarking models, as well as (ii) a large number of surrogate models with potentially large computational requirements. Relaxing these assumptions i.e., moving to a more pragmatic threat model results in a failed attack, with an evasion rate at most $21.1\%$. We show that when the configuration is mostly aligned, a simple non-optimization attack we propose, OFT, with one single surrogate model can already exceed the success of optimization-based efforts. Under the same $\ell_\infty$ norm perturbation budget of $0.25$, prior work~\citet{hu2024transfer} is comparable to or worse than OFT in $11$ out of $12$ configurations and has a limited advantage on the remaining one. The code used for all our experiments is available at \url{https://github.com/Ardor-Wu/transfer}.
Related papers
- Neural Honeytrace: A Robust Plug-and-Play Watermarking Framework against Model Extraction Attacks [39.06642008591216]
We propose Neural Honeytrace, a robust plug-and-play watermarking framework against model extraction attacks.
Neural Honeytrace reduces the average number of samples required for a worst-case t-Test-based copyright claim from $12,000$ to $200$ with zero training cost.
arXiv Detail & Related papers (2025-01-16T06:59:20Z) - $B^4$: A Black-Box Scrubbing Attack on LLM Watermarks [42.933100948624315]
Watermarking has emerged as a prominent technique for content detection by embedding imperceptible patterns.
Previous work typically considers a grey-box attack setting, where the specific type of watermark is already known.
We here propose $B4$, a black-box scrubbing attack on watermarks.
arXiv Detail & Related papers (2024-11-02T12:01:44Z) - Advancing Generalized Transfer Attack with Initialization Derived Bilevel Optimization and Dynamic Sequence Truncation [49.480978190805125]
Transfer attacks generate significant interest for black-box applications.
Existing works essentially directly optimize the single-level objective w.r.t. surrogate model.
We propose a bilevel optimization paradigm, which explicitly reforms the nested relationship between the Upper-Level (UL) pseudo-victim attacker and the Lower-Level (LL) surrogate attacker.
arXiv Detail & Related papers (2024-06-04T07:45:27Z) - I$^2$SB: Image-to-Image Schr\"odinger Bridge [87.43524087956457]
Image-to-Image Schr"odinger Bridge (I$2$SB) is a new class of conditional diffusion models.
I$2$SB directly learns the nonlinear diffusion processes between two given distributions.
We show that I$2$SB surpasses standard conditional diffusion models with more interpretable generative processes.
arXiv Detail & Related papers (2023-02-12T08:35:39Z) - Adversarial Pixel Restoration as a Pretext Task for Transferable
Perturbations [54.1807206010136]
Transferable adversarial attacks optimize adversaries from a pretrained surrogate model and known label space to fool the unknown black-box models.
We propose Adversarial Pixel Restoration as a self-supervised alternative to train an effective surrogate model from scratch.
Our training approach is based on a min-max objective which reduces overfitting via an adversarial objective.
arXiv Detail & Related papers (2022-07-18T17:59:58Z) - Towards Transferable Unrestricted Adversarial Examples with Minimum
Changes [13.75751221823941]
Transfer-based adversarial example is one of the most important classes of black-box attacks.
There is a trade-off between transferability and imperceptibility of the adversarial perturbation.
We propose a geometry-aware framework to generate transferable adversarial examples with minimum changes.
arXiv Detail & Related papers (2022-01-04T12:03:20Z) - Transferable Sparse Adversarial Attack [62.134905824604104]
We introduce a generator architecture to alleviate the overfitting issue and thus efficiently craft transferable sparse adversarial examples.
Our method achieves superior inference speed, 700$times$ faster than other optimization-based methods.
arXiv Detail & Related papers (2021-05-31T06:44:58Z) - Good Artists Copy, Great Artists Steal: Model Extraction Attacks Against
Image Translation Generative Adversarial Networks [12.605607949417031]
We show the first model extraction attack against real-world generative adversarial network (GAN) image translation models.
The adversary is not required to know $F_V$'s architecture or any other information about it beyond its intended image translation task.
We evaluate the effectiveness of our attacks using three different instances of two popular categories of image translation.
arXiv Detail & Related papers (2021-04-26T14:50:59Z) - On Generating Transferable Targeted Perturbations [102.3506210331038]
We propose a new generative approach for highly transferable targeted perturbations.
Our approach matches the perturbed image distribution' with that of the target class, leading to high targeted transferability rates.
arXiv Detail & Related papers (2021-03-26T17:55:28Z) - Understanding Frank-Wolfe Adversarial Training [1.2183405753834557]
Adversarial Training (AT) is a technique that approximately solves a robust optimization problem to minimize the worst-case loss.
A Frank-Wolfe adversarial training approach is presented and is shown to provide competitive level of robustness as PGD-AT.
arXiv Detail & Related papers (2020-12-22T21:36:52Z) - Model Watermarking for Image Processing Networks [120.918532981871]
How to protect the intellectual property of deep models is a very important but seriously under-researched problem.
We propose the first model watermarking framework for protecting image processing models.
arXiv Detail & Related papers (2020-02-25T18:36:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.