Neural Network Laundering: Removing Black-Box Backdoor Watermarks from
Deep Neural Networks
- URL: http://arxiv.org/abs/2004.11368v1
- Date: Wed, 22 Apr 2020 19:02:47 GMT
- Title: Neural Network Laundering: Removing Black-Box Backdoor Watermarks from
Deep Neural Networks
- Authors: William Aiken, Hyoungshick Kim, Simon Woo
- Abstract summary: We propose a neural network "laundering" algorithm to remove black-box backdoor watermarks from neural networks.
For all backdoor watermarking methods addressed in this paper, we find that the robustness of the watermark is significantly weaker than the original claims.
- Score: 17.720400846604907
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Creating a state-of-the-art deep-learning system requires vast amounts of
data, expertise, and hardware, yet research into embedding copyright protection
for neural networks has been limited. One of the main methods for achieving
such protection involves relying on the susceptibility of neural networks to
backdoor attacks, but the robustness of these tactics has been primarily
evaluated against pruning, fine-tuning, and model inversion attacks. In this
work, we propose a neural network "laundering" algorithm to remove black-box
backdoor watermarks from neural networks even when the adversary has no prior
knowledge of the structure of the watermark.
We are able to effectively remove watermarks used for recent defense or
copyright protection mechanisms while achieving test accuracies above 97% and
80% for both MNIST and CIFAR-10, respectively. For all backdoor watermarking
methods addressed in this paper, we find that the robustness of the watermark
is significantly weaker than the original claims. We also demonstrate the
feasibility of our algorithm in more complex tasks as well as in more realistic
scenarios where the adversary is able to carry out efficient laundering attacks
using less than 1% of the original training set size, demonstrating that
existing backdoor watermarks are not sufficient to reach their claims.
Related papers
- Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - Safe and Robust Watermark Injection with a Single OoD Image [90.71804273115585]
Training a high-performance deep neural network requires large amounts of data and computational resources.
We propose a safe and robust backdoor-based watermark injection technique.
We induce random perturbation of model parameters during watermark injection to defend against common watermark removal attacks.
arXiv Detail & Related papers (2023-09-04T19:58:35Z) - OVLA: Neural Network Ownership Verification using Latent Watermarks [7.661766773170363]
We present a novel methodology for neural network ownership verification based on latent watermarks.
We show that our approach offers strong defense against backdoor detection, backdoor removal and surrogate model attacks.
arXiv Detail & Related papers (2023-06-15T17:45:03Z) - Rethinking White-Box Watermarks on Deep Learning Models under Neural
Structural Obfuscation [24.07604618918671]
Copyright protection for deep neural networks (DNNs) is an urgent need for AI corporations.
White-box watermarking is believed to be accurate, credible and secure against most known watermark removal attacks.
We present the first systematic study on how the mainstream white-box watermarks are commonly vulnerable to neural structural obfuscation with textitdummy neurons.
arXiv Detail & Related papers (2023-03-17T02:21:41Z) - Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset
Copyright Protection [69.59980270078067]
We explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic.
We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification.
arXiv Detail & Related papers (2022-09-27T12:56:56Z) - "And Then There Were None": Cracking White-box DNN Watermarks via
Invariant Neuron Transforms [29.76685892624105]
We present the first effective removal attack which cracks almost all the existing white-box watermarking schemes.
Our attack requires no prior knowledge on the training data distribution or the adopted watermark algorithms, and leaves model functionality intact.
arXiv Detail & Related papers (2022-04-30T08:33:32Z) - Knowledge-Free Black-Box Watermark and Ownership Proof for Image
Classification Neural Networks [9.117248639119529]
We propose a knowledge-free black-box watermarking scheme for image classification neural networks.
A delicate encoding and verification protocol is designed to ensure the scheme's knowledgable security against adversaries.
Experiment results proved the functionality-preserving capability and security of the proposed watermarking scheme.
arXiv Detail & Related papers (2022-04-09T18:09:02Z) - Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack.
We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z) - Reversible Watermarking in Deep Convolutional Neural Networks for
Integrity Authentication [78.165255859254]
We propose a reversible watermarking algorithm for integrity authentication.
The influence of embedding reversible watermarking on the classification performance is less than 0.5%.
At the same time, the integrity of the model can be verified by applying the reversible watermarking.
arXiv Detail & Related papers (2021-04-09T09:32:21Z) - Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal
Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective.
We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations.
Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.