Related papers: Neural Network Laundering: Removing Black-Box Backdoor Watermarks from Deep Neural Networks

Neural Network Laundering: Removing Black-Box Backdoor Watermarks from Deep Neural Networks

URL: http://arxiv.org/abs/2004.11368v1
Date: Wed, 22 Apr 2020 19:02:47 GMT
Title: Neural Network Laundering: Removing Black-Box Backdoor Watermarks from Deep Neural Networks
Authors: William Aiken, Hyoungshick Kim, Simon Woo
Abstract summary: We propose a neural network "laundering" algorithm to remove black-box backdoor watermarks from neural networks. For all backdoor watermarking methods addressed in this paper, we find that the robustness of the watermark is significantly weaker than the original claims.
Score: 17.720400846604907
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Creating a state-of-the-art deep-learning system requires vast amounts of data, expertise, and hardware, yet research into embedding copyright protection for neural networks has been limited. One of the main methods for achieving such protection involves relying on the susceptibility of neural networks to backdoor attacks, but the robustness of these tactics has been primarily evaluated against pruning, fine-tuning, and model inversion attacks. In this work, we propose a neural network "laundering" algorithm to remove black-box backdoor watermarks from neural networks even when the adversary has no prior knowledge of the structure of the watermark. We are able to effectively remove watermarks used for recent defense or copyright protection mechanisms while achieving test accuracies above 97% and 80% for both MNIST and CIFAR-10, respectively. For all backdoor watermarking methods addressed in this paper, we find that the robustness of the watermark is significantly weaker than the original claims. We also demonstrate the feasibility of our algorithm in more complex tasks as well as in more realistic scenarios where the adversary is able to carry out efficient laundering attacks using less than 1% of the original training set size, demonstrating that existing backdoor watermarks are not sufficient to reach their claims.

Related papers

Persistence of Backdoor-based Watermarks for Neural Networks: A Comprehensive Evaluation [3.1858340237924776]
backdoor-based watermarks have been actively developed in recent years to preserve proprietary rights. In this paper, we evaluate the persistence of recent backdoor-based watermarks within neural networks in the scenario of fine-tuning. We propose/develop a novel data-driven idea to restore watermark after fine-tuning without exposing the trigger set.
arXiv Detail & Related papers (2025-01-06T01:15:35Z)
Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model. We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior. Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z)
Safe and Robust Watermark Injection with a Single OoD Image [90.71804273115585]
Training a high-performance deep neural network requires large amounts of data and computational resources. We propose a safe and robust backdoor-based watermark injection technique. We induce random perturbation of model parameters during watermark injection to defend against common watermark removal attacks.
arXiv Detail & Related papers (2023-09-04T19:58:35Z)
OVLA: Neural Network Ownership Verification using Latent Watermarks [7.661766773170363]
We present a novel methodology for neural network ownership verification based on latent watermarks. We show that our approach offers strong defense against backdoor detection, backdoor removal and surrogate model attacks.
arXiv Detail & Related papers (2023-06-15T17:45:03Z)
Rethinking White-Box Watermarks on Deep Learning Models under Neural Structural Obfuscation [24.07604618918671]
Copyright protection for deep neural networks (DNNs) is an urgent need for AI corporations. White-box watermarking is believed to be accurate, credible and secure against most known watermark removal attacks. We present the first systematic study on how the mainstream white-box watermarks are commonly vulnerable to neural structural obfuscation with textitdummy neurons.
arXiv Detail & Related papers (2023-03-17T02:21:41Z)
Untargeted Backdoor Watermark: Towards Harmless and Stealthy Dataset Copyright Protection [69.59980270078067]
We explore the untargeted backdoor watermarking scheme, where the abnormal model behaviors are not deterministic. We also discuss how to use the proposed untargeted backdoor watermark for dataset ownership verification.
arXiv Detail & Related papers (2022-09-27T12:56:56Z)
"And Then There Were None": Cracking White-box DNN Watermarks via Invariant Neuron Transforms [29.76685892624105]
We present the first effective removal attack which cracks almost all the existing white-box watermarking schemes. Our attack requires no prior knowledge on the training data distribution or the adopted watermark algorithms, and leaves model functionality intact.
arXiv Detail & Related papers (2022-04-30T08:33:32Z)
Knowledge-Free Black-Box Watermark and Ownership Proof for Image Classification Neural Networks [9.117248639119529]
We propose a knowledge-free black-box watermarking scheme for image classification neural networks. A delicate encoding and verification protocol is designed to ensure the scheme's knowledgable security against adversaries. Experiment results proved the functionality-preserving capability and security of the proposed watermarking scheme.
arXiv Detail & Related papers (2022-04-09T18:09:02Z)
Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack. We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z)
Reversible Watermarking in Deep Convolutional Neural Networks for Integrity Authentication [78.165255859254]
We propose a reversible watermarking algorithm for integrity authentication. The influence of embedding reversible watermarking on the classification performance is less than 0.5%. At the same time, the integrity of the model can be verified by applying the reversible watermarking.
arXiv Detail & Related papers (2021-04-09T09:32:21Z)
Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective. We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations. Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.