NSmark: Null Space Based Black-box Watermarking Defense Framework for Pre-trained Language Models
- URL: http://arxiv.org/abs/2410.13907v1
- Date: Wed, 16 Oct 2024 14:45:27 GMT
- Title: NSmark: Null Space Based Black-box Watermarking Defense Framework for Pre-trained Language Models
- Authors: Haodong Zhao, Jinming Hu, Peixuan Li, Fangqi Li, Jinrui Sha, Peixuan Chen, Zhuosheng Zhang, Gongshen Liu,
- Abstract summary: We propose a task-agnostic, black-box watermarking scheme capable of resisting LL-LFEA attacks.
NSmark consists of three phases: (i) watermark generation using the digital signature of the owner, enhanced by spread spectrum modulation for increased robustness; (ii) watermark embedding through an output mapping extractor that preserves PLM performance while maximizing watermark capacity; (iii) watermark verification, assessed by extraction rate and null space conformity.
- Score: 24.864736672581937
- License:
- Abstract: Pre-trained language models (PLMs) have emerged as critical intellectual property (IP) assets that necessitate protection. Although various watermarking strategies have been proposed, they remain vulnerable to Linear Functionality Equivalence Attacks (LFEA), which can invalidate most existing white-box watermarks without prior knowledge of the watermarking scheme or training data. This paper further analyzes and extends the attack scenarios of LFEA to the commonly employed black-box settings for PLMs by considering Last-Layer outputs (dubbed LL-LFEA). We discover that the null space of the output matrix remains invariant against LL-LFEA attacks. Based on this finding, we propose NSmark, a task-agnostic, black-box watermarking scheme capable of resisting LL-LFEA attacks. NSmark consists of three phases: (i) watermark generation using the digital signature of the owner, enhanced by spread spectrum modulation for increased robustness; (ii) watermark embedding through an output mapping extractor that preserves PLM performance while maximizing watermark capacity; (iii) watermark verification, assessed by extraction rate and null space conformity. Extensive experiments on both pre-training and downstream tasks confirm the effectiveness, reliability, fidelity, and robustness of our approach. Code is available at https://github.com/dongdongzhaoUP/NSmark.
Related papers
- Your Fixed Watermark is Fragile: Towards Semantic-Aware Watermark for EaaS Copyright Protection [5.2431999629987]
Embedding-as-a-Service (E) has emerged as a successful business pattern but faces significant challenges related to copyright infringement.
Various studies have proposed backdoor-based watermarking schemes to protect the copyright of E services.
In this paper, we reveal that previous watermarking schemes possess semantic-independent characteristics.
arXiv Detail & Related papers (2024-11-14T11:06:34Z) - ESpeW: Robust Copyright Protection for LLM-based EaaS via Embedding-Specific Watermark [50.08021440235581]
Embeds as a Service (Eding) is emerging as a crucial role in AI applications.
Eding is vulnerable to model extraction attacks, highlighting the urgent need for copyright protection.
We propose a novel embedding-specific watermarking (ESpeW) mechanism to offer robust copyright protection for Eding.
arXiv Detail & Related papers (2024-10-23T04:34:49Z) - De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively.
Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z) - Large Language Model Watermark Stealing With Mixed Integer Programming [51.336009662771396]
Large Language Model (LLM) watermark shows promise in addressing copyright, monitoring AI-generated text, and preventing its misuse.
Recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks.
We propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme.
arXiv Detail & Related papers (2024-05-30T04:11:17Z) - Black-Box Detection of Language Model Watermarks [1.9374282535132377]
We develop rigorous statistical tests to detect the presence of all three most popular watermarking scheme families using only a limited number of black-box queries.
Our findings indicate that current watermarking schemes are more detectable than previously believed, and that obscuring the fact that a watermark was deployed may not be a viable way for providers to protect against adversaries.
arXiv Detail & Related papers (2024-05-28T08:41:30Z) - ModelShield: Adaptive and Robust Watermark against Model Extraction Attack [58.46326901858431]
Large language models (LLMs) demonstrate general intelligence across a variety of machine learning tasks.
adversaries can still utilize model extraction attacks to steal the model intelligence encoded in model generation.
Watermarking technology offers a promising solution for defending against such attacks by embedding unique identifiers into the model-generated content.
arXiv Detail & Related papers (2024-05-03T06:41:48Z) - DIP-Watermark: A Double Identity Protection Method Based on Robust Adversarial Watermark [13.007649270429493]
Face Recognition (FR) systems pose privacy risks.
One countermeasure is adversarial attack, deceiving unauthorized malicious FR.
We propose the first double identity protection scheme based on traceable adversarial watermarking.
arXiv Detail & Related papers (2024-04-23T02:50:38Z) - DeepEclipse: How to Break White-Box DNN-Watermarking Schemes [60.472676088146436]
We present obfuscation techniques that significantly differ from the existing white-box watermarking removal schemes.
DeepEclipse can evade watermark detection without prior knowledge of the underlying watermarking scheme.
Our evaluation reveals that DeepEclipse excels in breaking multiple white-box watermarking schemes.
arXiv Detail & Related papers (2024-03-06T10:24:47Z) - EmMark: Robust Watermarks for IP Protection of Embedded Quantized Large
Language Models [21.28690053570814]
This paper introduces EmMark, a novel watermarking framework for protecting the intellectual property (IP) of embedded large language models deployed on resource-constrained edge devices.
To address the IP theft risks posed by malicious end-users, EmMark enables proprietors to authenticate ownership by querying the watermarked model weights and matching the inserted signatures.
arXiv Detail & Related papers (2024-02-27T23:30:17Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - Neural Dehydration: Effective Erasure of Black-box Watermarks from DNNs with Limited Data [23.90041044463682]
We propose a watermark-agnostic removal attack called textscNeural Dehydration (textitabbrev. textscDehydra)
Our attack pipeline exploits the internals of the protected model to recover and unlearn the watermark message.
We achieve strong removal effectiveness across all the covered watermarks, preserving at least $90%$ of the stolen model utility.
arXiv Detail & Related papers (2023-09-07T03:16:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.