DeepHider: A Multi-module and Invisibility Watermarking Scheme for
Language Model
- URL: http://arxiv.org/abs/2208.04676v1
- Date: Tue, 9 Aug 2022 11:53:24 GMT
- Title: DeepHider: A Multi-module and Invisibility Watermarking Scheme for
Language Model
- Authors: Long Dai, Jiarong Mao, Xuefeng Fan, Xiaoyi Zhou
- Abstract summary: This paper proposes a new threat of replacing the model classification module and performing global fine-tuning of the model.
We use the properties of blockchain such as tamper-proof and traceability to prevent the ownership statement of thieves.
Experiments show that the proposed scheme successfully verifies ownership with 100% watermark verification accuracy.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: With the rapid development of natural language processing (NLP) technology,
NLP models have shown great economic value in business. However, the owner's
models are vulnerable to the threat of pirated redistribution, which breaks the
symmetry relationship between model owners and consumers. Therefore, a model
protection mechanism is needed to keep the symmetry from being broken.
Currently, language model protection schemes based on black-box verification
perform poorly in terms of invisibility of trigger samples, which are easily
detected by humans or anomaly detectors and thus prevent verification. To solve
this problem, this paper proposes a trigger sample of the triggerless mode for
ownership verification. In addition, a thief may replace the classification
module for a watermarked model to satisfy its specific classification task and
remove the watermark present in the model. Therefore, this paper further
proposes a new threat of replacing the model classification module and
performing global fine-tuning of the model, and successfully verifies the model
ownership through a white-box approach. Meanwhile, we use the properties of
blockchain such as tamper-proof and traceability to prevent the ownership
statement of thieves. Experiments show that the proposed scheme successfully
verifies ownership with 100% watermark verification accuracy without affecting
the original performance of the model, and has strong robustness and low False
trigger rate.
Related papers
- AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA [67.68750063537482]
Diffusion models have achieved remarkable success in generating high-quality images.
Recent works aim to let SD models output watermarked content for post-hoc forensics.
We propose textttmethod as the first implementation under this scenario.
arXiv Detail & Related papers (2024-05-18T01:25:47Z) - Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution [22.933101948176606]
backdoor-based model watermarks are the primary and cutting-edge methods to implant such properties in released models.
We design a new watermarking paradigm, $i.e.$, Explanation as a Watermark (EaaW), that implants verification behaviors into the explanation of feature attribution.
arXiv Detail & Related papers (2024-05-08T05:49:46Z) - ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.
adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.
Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - Seeds Don't Lie: An Adaptive Watermarking Framework for Computer Vision
Models [44.80560808267494]
We present an adaptive framework to watermark a protected model, leveraging the unique behavior present in the model.
This watermark is used to detect extracted models, which have the same unique behavior, indicating an unauthorized usage of the protected model's IP.
We show that the framework is robust to (1) unseen model extraction attacks, and (2) extracted models which undergo a method (e.g., weight pruning)
arXiv Detail & Related papers (2022-11-24T14:48:40Z) - Watermarking for Out-of-distribution Detection [76.20630986010114]
Out-of-distribution (OOD) detection aims to identify OOD data based on representations extracted from well-trained deep models.
We propose a general methodology named watermarking in this paper.
We learn a unified pattern that is superimposed onto features of original data, and the model's detection capability is largely boosted after watermarking.
arXiv Detail & Related papers (2022-10-27T06:12:32Z) - Are You Stealing My Model? Sample Correlation for Fingerprinting Deep
Neural Networks [86.55317144826179]
Previous methods always leverage the transferable adversarial examples as the model fingerprint.
We propose a novel yet simple model stealing detection method based on SAmple Correlation (SAC)
SAC successfully defends against various model stealing attacks, even including adversarial training or transfer learning.
arXiv Detail & Related papers (2022-10-21T02:07:50Z) - Neural network fragile watermarking with no model performance
degradation [28.68910526223425]
We propose a novel neural network fragile watermarking with no model performance degradation.
Experiments show that the proposed method can effectively detect model malicious fine-tuning with no model performance degradation.
arXiv Detail & Related papers (2022-08-16T07:55:20Z) - MOVE: Effective and Harmless Ownership Verification via Embedded
External Features [109.19238806106426]
We propose an effective and harmless model ownership verification (MOVE) to defend against different types of model stealing simultaneously.
We conduct the ownership verification by verifying whether a suspicious model contains the knowledge of defender-specified external features.
In particular, we develop our MOVE method under both white-box and black-box settings to provide comprehensive model protection.
arXiv Detail & Related papers (2022-08-04T02:22:29Z) - A Systematic Review on Model Watermarking for Neural Networks [1.2691047660244335]
This work presents a taxonomy identifying and analyzing different classes of watermarking schemes for machine learning models.
It introduces a unified threat model to allow structured reasoning on and comparison of the effectiveness of watermarking methods.
It systematizes desired security requirements and attacks against ML model watermarking.
arXiv Detail & Related papers (2020-09-25T12:03:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.