Related papers: Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in Deep Neural Networks

Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in Deep Neural Networks

URL: http://arxiv.org/abs/2106.10147v1
Date: Fri, 18 Jun 2021 14:23:55 GMT
Title: Evaluating the Robustness of Trigger Set-Based Watermarks Embedded in Deep Neural Networks
Authors: Suyoung Lee, Wonho Song, Suman Jana, Meeyoung Cha, Sooel Son
Abstract summary: State-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership. We propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model.
Score: 22.614495877481144
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Trigger set-based watermarking schemes have gained emerging attention as they provide a means to prove ownership for deep neural network model owners. In this paper, we argue that state-of-the-art trigger set-based watermarking algorithms do not achieve their designed goal of proving ownership. We posit that this impaired capability stems from two common experimental flaws that the existing research practice has committed when evaluating the robustness of watermarking algorithms: (1) incomplete adversarial evaluation and (2) overlooked adaptive attacks. We conduct a comprehensive adversarial evaluation of 10 representative watermarking schemes against six of the existing attacks and demonstrate that each of these watermarking schemes lacks robustness against at least two attacks. We also propose novel adaptive attacks that harness the adversary's knowledge of the underlying watermarking algorithm of a target model. We demonstrate that the proposed attacks effectively break all of the 10 watermarking schemes, consequently allowing adversaries to obscure the ownership of any watermarked model. We encourage follow-up studies to consider our guidelines when evaluating the robustness of their watermarking schemes via conducting comprehensive adversarial evaluation that include our adaptive attacks to demonstrate a meaningful upper bound of watermark robustness.

Related papers

On the Weaknesses of Backdoor-based Model Watermarking: An Information-theoretic Perspective [39.676548104635096]
Safeguarding the intellectual property of machine learning models has emerged as a pressing concern in AI security. Model watermarking is a powerful technique for protecting ownership of machine learning models. We propose a novel model watermarking scheme, In-distribution Watermark Embedding (IWE), to overcome the limitations of existing method.
arXiv Detail & Related papers (2024-09-10T00:55:21Z)
Robustness of Watermarking on Text-to-Image Diffusion Models [9.277492743469235]
We investigate the robustness of generative watermarking, which is created from the integration of watermarking embedding and text-to-image generation processing. We found that generative watermarking methods are robust to direct evasion attacks, like discriminator-based attacks, or manipulation based on the edge information in edge prediction-based attacks but vulnerable to malicious fine-tuning.
arXiv Detail & Related papers (2024-08-04T13:59:09Z)
Watermarking Recommender Systems [52.207721219147814]
We introduce Autoregressive Out-of-distribution Watermarking (AOW), a novel technique tailored specifically for recommender systems. Our approach entails selecting an initial item and querying it through the oracle model, followed by the selection of subsequent items with small prediction scores. To assess the efficacy of the watermark, the model is tasked with predicting the subsequent item given a truncated watermark sequence.
arXiv Detail & Related papers (2024-07-17T06:51:24Z)
Certifiably Robust Image Watermark [57.546016845801134]
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns. Watermarking AI-generated content is a key technology to address these concerns. We propose the first image watermarks with certified robustness guarantees against removal and forgery attacks.
arXiv Detail & Related papers (2024-07-04T17:56:04Z)
DIP-Watermark: A Double Identity Protection Method Based on Robust Adversarial Watermark [13.007649270429493]
Face Recognition (FR) systems pose privacy risks. One countermeasure is adversarial attack, deceiving unauthorized malicious FR. We propose the first double identity protection scheme based on traceable adversarial watermarking.
arXiv Detail & Related papers (2024-04-23T02:50:38Z)
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models [19.29349934856703]
A strong watermarking scheme satisfies the property that a computationally bounded attacker cannot erase the watermark without causing significant quality degradation. We prove that, under well-specified and natural assumptions, strong watermarking is impossible to achieve.
arXiv Detail & Related papers (2023-11-07T22:52:54Z)
Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model. We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z)
Safe and Robust Watermark Injection with a Single OoD Image [90.71804273115585]
Training a high-performance deep neural network requires large amounts of data and computational resources. We propose a safe and robust backdoor-based watermark injection technique. We induce random perturbation of model parameters during watermark injection to defend against common watermark removal attacks.
arXiv Detail & Related papers (2023-09-04T19:58:35Z)
SoK: How Robust is Image Classification Deep Neural Network Watermarking? (Extended Version) [16.708069984516964]
We evaluate whether recently proposed watermarking schemes that claim robustness are robust against a large set of removal attacks. None of the surveyed watermarking schemes is robust in practice datasets. We show that watermarking schemes need to be evaluated against a more extensive set of removal attacks with a more realistic adversary model.
arXiv Detail & Related papers (2021-08-11T00:23:33Z)
Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack. We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z)
Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective. We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations. Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.