Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in
Deep Neural Networks
- URL: http://arxiv.org/abs/2106.10147v1
- Date: Fri, 18 Jun 2021 14:23:55 GMT
- Title: Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in
Deep Neural Networks
- Authors: Suyoung Lee, Wonho Song, Suman Jana, Meeyoung Cha, Sooel Son
- Abstract summary: State-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership.
We propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model.
- Score: 22.614495877481144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Trigger set-based watermarking schemes have gained emerging attention as they
provide a means to prove ownership for deep neural network model owners. In
this paper, we argue that state-of-the-art trigger set-based watermarking
algorithms do not achieve their designed goal of proving ownership. We posit
that this impaired capability stems from two common experimental flaws that the
existing research practice has committed when evaluating the robustness of
watermarking algorithms: (1) incomplete adversarial evaluation and (2)
overlooked adaptive attacks.
We conduct a comprehensive adversarial evaluation of 10 representative
watermarking schemes against six of the existing attacks and demonstrate that
each of these watermarking schemes lacks robustness against at least two
attacks. We also propose novel adaptive attacks that harness the adversary's
knowledge of the underlying watermarking algorithm of a target model. We
demonstrate that the proposed attacks effectively break all of the 10
watermarking schemes, consequently allowing adversaries to obscure the
ownership of any watermarked model. We encourage follow-up studies to consider
our guidelines when evaluating the robustness of their watermarking schemes via
conducting comprehensive adversarial evaluation that include our adaptive
attacks to demonstrate a meaningful upper bound of watermark robustness.
Related papers
- Certifiably Robust Image Watermark [57.546016845801134]
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns.
Watermarking AI-generated content is a key technology to address these concerns.
We propose the first image watermarks with certified robustness guarantees against removal and forgery attacks.
arXiv Detail & Related papers (2024-07-04T17:56:04Z) - Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion [15.086451828825398]
evasion adversaries can readily exploit the shortcuts created by models memorizing watermark samples.
By learning the model to accurately recognize them, unique watermark behaviors are promoted through knowledge injection.
arXiv Detail & Related papers (2024-04-21T03:38:20Z) - Elevating Defenses: Bridging Adversarial Training and Watermarking for
Model Resilience [2.8084422332394428]
This work introduces a novel framework to integrate adversarial training with watermarking techniques to fortify against evasion attacks.
We use the MNIST and Fashion-MNIST datasets to evaluate our proposed technique on various model stealing attacks.
arXiv Detail & Related papers (2023-12-21T19:21:36Z) - Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models [19.29349934856703]
A strong watermarking scheme satisfies the property that a computationally bounded attacker cannot erase the watermark without causing significant quality degradation.
We prove that, under well-specified and natural assumptions, strong watermarking is impossible to achieve.
arXiv Detail & Related papers (2023-11-07T22:52:54Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - Hybrid Design of Multiplicative Watermarking for Defense Against Malicious Parameter Identification [46.27328641616778]
We propose a hybrid multiplicative watermarking scheme, where the watermark parameters are periodically updated.
We show that the proposed approach makes it difficult for an eavesdropper to reconstruct the watermarking parameters.
arXiv Detail & Related papers (2023-09-05T16:56:53Z) - Safe and Robust Watermark Injection with a Single OoD Image [90.71804273115585]
Training a high-performance deep neural network requires large amounts of data and computational resources.
We propose a safe and robust backdoor-based watermark injection technique.
We induce random perturbation of model parameters during watermark injection to defend against common watermark removal attacks.
arXiv Detail & Related papers (2023-09-04T19:58:35Z) - SoK: How Robust is Image Classification Deep Neural Network
Watermarking? (Extended Version) [16.708069984516964]
We evaluate whether recently proposed watermarking schemes that claim robustness are robust against a large set of removal attacks.
None of the surveyed watermarking schemes is robust in practice datasets.
We show that watermarking schemes need to be evaluated against a more extensive set of removal attacks with a more realistic adversary model.
arXiv Detail & Related papers (2021-08-11T00:23:33Z) - Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack.
We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z) - Reversible Watermarking in Deep Convolutional Neural Networks for
Integrity Authentication [78.165255859254]
We propose a reversible watermarking algorithm for integrity authentication.
The influence of embedding reversible watermarking on the classification performance is less than 0.5%.
At the same time, the integrity of the model can be verified by applying the reversible watermarking.
arXiv Detail & Related papers (2021-04-09T09:32:21Z) - Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal
Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective.
We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations.
Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.