Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in
Deep Neural Networks
- URL: http://arxiv.org/abs/2106.10147v1
- Date: Fri, 18 Jun 2021 14:23:55 GMT
- Title: Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in
Deep Neural Networks
- Authors: Suyoung Lee, Wonho Song, Suman Jana, Meeyoung Cha, Sooel Son
- Abstract summary: State-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership.
We propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model.
- Score: 22.614495877481144
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Trigger set-based watermarking schemes have gained emerging attention as they
provide a means to prove ownership for deep neural network model owners. In
this paper, we argue that state-of-the-art trigger set-based watermarking
algorithms do not achieve their designed goal of proving ownership. We posit
that this impaired capability stems from two common experimental flaws that the
existing research practice has committed when evaluating the robustness of
watermarking algorithms: (1) incomplete adversarial evaluation and (2)
overlooked adaptive attacks.
We conduct a comprehensive adversarial evaluation of 10 representative
watermarking schemes against six of the existing attacks and demonstrate that
each of these watermarking schemes lacks robustness against at least two
attacks. We also propose novel adaptive attacks that harness the adversary's
knowledge of the underlying watermarking algorithm of a target model. We
demonstrate that the proposed attacks effectively break all of the 10
watermarking schemes, consequently allowing adversaries to obscure the
ownership of any watermarked model. We encourage follow-up studies to consider
our guidelines when evaluating the robustness of their watermarking schemes via
conducting comprehensive adversarial evaluation that include our adaptive
attacks to demonstrate a meaningful upper bound of watermark robustness.
Related papers
- On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective [39.676548104635096]
Safeguarding the intellectual property of machine learning models has emerged as a pressing concern in AI security.
Model watermarking is a powerful technique for protecting ownership of machine learning models.
We propose a novel model watermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the limitations of existing method.
arXiv Detail & Related papers (2024-09-10T00:55:21Z) - Robustness of Watermarking on Text-to-Image Diffusion Models [9.277492743469235]
We investigate the robustness of generative watermarking, which is created from the integration of watermarking embedding and text-to-image generation processing.
We found that generative watermarking methods are robust to direct evasion attacks, like discriminator-based attacks, or manipulation based on the edge information in edge prediction-based attacks but vulnerable to malicious fine-tuning.
arXiv Detail & Related papers (2024-08-04T13:59:09Z) - Watermarking Recommender Systems [52.207721219147814]
We introduce Autoregressive Out-of-distribution Watermarking (AOW), a novel technique tailored specifically for recommender systems.
Our approach entails selecting an initial item and querying it through the oracle model, followed by the selection of subsequent items with small prediction scores.
To assess the efficacy of the watermark, the model is tasked with predicting the subsequent item given a truncated watermark sequence.
arXiv Detail & Related papers (2024-07-17T06:51:24Z) - Certifiably Robust Image Watermark [57.546016845801134]
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns.
Watermarking AI-generated content is a key technology to address these concerns.
We propose the first image watermarks with certified robustness guarantees against removal and forgery attacks.
arXiv Detail & Related papers (2024-07-04T17:56:04Z) - DIP-Watermark: A Double Identity Protection Method Based on Robust Adversarial Watermark [13.007649270429493]
Face Recognition (FR) systems pose privacy risks.
One countermeasure is adversarial attack, deceiving unauthorized malicious FR.
We propose the first double identity protection scheme based on traceable adversarial watermarking.
arXiv Detail & Related papers (2024-04-23T02:50:38Z) - Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models [19.29349934856703]
A strong watermarking scheme satisfies the property that a computationally bounded attacker cannot erase the watermark without causing significant quality degradation.
We prove that, under well-specified and natural assumptions, strong watermarking is impossible to achieve.
arXiv Detail & Related papers (2023-11-07T22:52:54Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - Safe and Robust Watermark Injection with a Single OoD Image [90.71804273115585]
Training a high-performance deep neural network requires large amounts of data and computational resources.
We propose a safe and robust backdoor-based watermark injection technique.
We induce random perturbation of model parameters during watermark injection to defend against common watermark removal attacks.
arXiv Detail & Related papers (2023-09-04T19:58:35Z) - SoK: How Robust is Image Classification Deep Neural Network
Watermarking? (Extended Version) [16.708069984516964]
We evaluate whether recently proposed watermarking schemes that claim robustness are robust against a large set of removal attacks.
None of the surveyed watermarking schemes is robust in practice datasets.
We show that watermarking schemes need to be evaluated against a more extensive set of removal attacks with a more realistic adversary model.
arXiv Detail & Related papers (2021-08-11T00:23:33Z) - Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack.
We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z) - Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal
Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective.
We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations.
Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.