MIRA: Cracking Black-box Watermarking on Deep Neural Networks via Model
Inversion-based Removal Attacks
- URL: http://arxiv.org/abs/2309.03466v1
- Date: Thu, 7 Sep 2023 03:16:03 GMT
- Title: MIRA: Cracking Black-box Watermarking on Deep Neural Networks via Model
Inversion-based Removal Attacks
- Authors: Yifan Lu, Wenxuan Li, Mi Zhang, Xudong Pan, Min Yang
- Abstract summary: We propose a novel Model Inversion-based Removal Attack (textscMira) against black-box watermarking schemes.
In general, our attack pipeline exploits the internals of the protected model to recover and unlearn the watermark message.
We show that textscMira achieves strong watermark removal effects on the covered watermarks, preserving at least $90%$ of the stolen model utility.
- Score: 25.641458647180997
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To protect the intellectual property of well-trained deep neural networks
(DNNs), black-box DNN watermarks, which are embedded into the prediction
behavior of DNN models on a set of specially-crafted samples, have gained
increasing popularity in both academy and industry. Watermark robustness is
usually implemented against attackers who steal the protected model and
obfuscate its parameters for watermark removal. Recent studies empirically
prove the robustness of most black-box watermarking schemes against known
removal attempts.
In this paper, we propose a novel Model Inversion-based Removal Attack
(\textsc{Mira}), which is watermark-agnostic and effective against most of
mainstream black-box DNN watermarking schemes. In general, our attack pipeline
exploits the internals of the protected model to recover and unlearn the
watermark message. We further design target class detection and recovered
sample splitting algorithms to reduce the utility loss caused by \textsc{Mira}
and achieve data-free watermark removal on half of the watermarking schemes. We
conduct comprehensive evaluation of \textsc{Mira} against ten mainstream
black-box watermarks on three benchmark datasets and DNN architectures.
Compared with six baseline removal attacks, \textsc{Mira} achieves strong
watermark removal effects on the covered watermarks, preserving at least $90\%$
of the stolen model utility, under more relaxed or even no assumptions on the
dataset availability.
Related papers
- DeepEclipse: How to Break White-Box DNN-Watermarking Schemes [60.472676088146436]
We present obfuscation techniques that significantly differ from the existing white-box watermarking removal schemes.
DeepEclipse can evade watermark detection without prior knowledge of the underlying watermarking scheme.
Our evaluation reveals that DeepEclipse excels in breaking multiple white-box watermarking schemes.
arXiv Detail & Related papers (2024-03-06T10:24:47Z) - ClearMark: Intuitive and Robust Model Watermarking via Transposed Model
Training [50.77001916246691]
This paper introduces ClearMark, the first DNN watermarking method designed for intuitive human assessment.
ClearMark embeds visible watermarks, enabling human decision-making without rigid value thresholds.
It shows an 8,544-bit watermark capacity comparable to the strongest existing work.
arXiv Detail & Related papers (2023-10-25T08:16:55Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z) - On Function-Coupled Watermarks for Deep Neural Networks [15.478746926391146]
We propose a novel DNN watermarking solution that can effectively defend against watermark removal attacks.
Our key insight is to enhance the coupling of the watermark and model functionalities.
Results show a 100% watermark authentication success rate under aggressive watermark removal attacks.
arXiv Detail & Related papers (2023-02-08T05:55:16Z) - Exploring Structure Consistency for Deep Model Watermarking [122.38456787761497]
The intellectual property (IP) of Deep neural networks (DNNs) can be easily stolen'' by surrogate model attack.
We propose a new watermarking methodology, namely structure consistency'', based on which a new deep structure-aligned model watermarking algorithm is designed.
arXiv Detail & Related papers (2021-08-05T04:27:15Z) - Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal
Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective.
We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations.
Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z) - Removing Backdoor-Based Watermarks in Neural Networks with Limited Data [26.050649487499626]
Trading deep models is highly demanded and lucrative nowadays.
naive trading schemes typically involve potential risks related to copyright and trustworthiness issues.
We propose a novel backdoor-based watermark removal framework using limited data, dubbed WILD.
arXiv Detail & Related papers (2020-08-02T06:25:26Z) - Neural Network Laundering: Removing Black-Box Backdoor Watermarks from
Deep Neural Networks [17.720400846604907]
We propose a neural network "laundering" algorithm to remove black-box backdoor watermarks from neural networks.
For all backdoor watermarking methods addressed in this paper, we find that the robustness of the watermark is significantly weaker than the original claims.
arXiv Detail & Related papers (2020-04-22T19:02:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.