NSmark: Null Space Based Black-box Watermarking Defense Framework for Language Models
- URL: http://arxiv.org/abs/2410.13907v2
- Date: Mon, 03 Feb 2025 03:15:34 GMT
- Title: NSmark: Null Space Based Black-box Watermarking Defense Framework for Language Models
- Authors: Haodong Zhao, Jinming Hu, Peixuan Li, Fangqi Li, Jinrui Sha, Tianjie Ju, Peixuan Chen, Zhuosheng Zhang, Gongshen Liu,
- Abstract summary: Language models (LMs) have emerged as critical intellectual property (IP) assets that necessitate protection.
This paper analyzes and extends the attack scenarios of Linear Functionality Equivalence Attack (LFEA) to the commonly employed black-box settings for LMs.
We propose NSmark, a black-box watermarking scheme that is task-agnostic and capable of resisting LL-LFEA attacks.
- Score: 24.544014337601286
- License:
- Abstract: Language models (LMs) have emerged as critical intellectual property (IP) assets that necessitate protection. Although various watermarking strategies have been proposed, they remain vulnerable to Linear Functionality Equivalence Attack (LFEA), which can invalidate most existing white-box watermarks without prior knowledge of the watermarking scheme or training data. This paper analyzes and extends the attack scenarios of LFEA to the commonly employed black-box settings for LMs by considering Last-Layer outputs (dubbed LL-LFEA). We discover that the null space of the output matrix remains invariant against LL-LFEA attacks. Based on this finding, we propose NSmark, a black-box watermarking scheme that is task-agnostic and capable of resisting LL-LFEA attacks. NSmark consists of three phases: (i) watermark generation using the digital signature of the owner, enhanced by spread spectrum modulation for increased robustness; (ii) watermark embedding through an output mapping extractor that preserves the LM performance while maximizing watermark capacity; (iii) watermark verification, assessed by extraction rate and null space conformity. Extensive experiments on both pre-training and downstream tasks confirm the effectiveness, scalability, reliability, fidelity, and robustness of our approach. Code is available at https://github.com/dongdongzhaoUP/NSmark.
Related papers
- Your Semantic-Independent Watermark is Fragile: A Semantic Perturbation Attack against EaaS Watermark [5.2431999629987]
Various studies have proposed backdoor-based watermarking schemes to protect the copyright of E services.
In this paper, we reveal that previous watermarking schemes possess semantic-independent characteristics and propose the Semantic Perturbation Attack (SPA)
Our theoretical and experimental analysis demonstrate that this semantic-independent nature makes current watermarking schemes vulnerable to adaptive attacks that exploit semantic perturbations tests to bypass watermark verification.
arXiv Detail & Related papers (2024-11-14T11:06:34Z) - ESpeW: Robust Copyright Protection for LLM-based EaaS via Embedding-Specific Watermark [50.08021440235581]
Embeds as a Service (Eding) is emerging as a crucial role in AI applications.
Eding is vulnerable to model extraction attacks, highlighting the urgent need for copyright protection.
We propose a novel embedding-specific watermarking (ESpeW) mechanism to offer robust copyright protection for Eding.
arXiv Detail & Related papers (2024-10-23T04:34:49Z) - De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively.
Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z) - Large Language Model Watermark Stealing With Mixed Integer Programming [51.336009662771396]
Large Language Model (LLM) watermark shows promise in addressing copyright, monitoring AI-generated text, and preventing its misuse.
Recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks.
We propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme.
arXiv Detail & Related papers (2024-05-30T04:11:17Z) - Black-Box Detection of Language Model Watermarks [1.9374282535132377]
We develop rigorous statistical tests to detect the presence of all three most popular watermarking scheme families using only a limited number of black-box queries.
Our findings indicate that current watermarking schemes are more detectable than previously believed, and that obscuring the fact that a watermark was deployed may not be a viable way for providers to protect against adversaries.
arXiv Detail & Related papers (2024-05-28T08:41:30Z) - DIP-Watermark: A Double Identity Protection Method Based on Robust Adversarial Watermark [13.007649270429493]
Face Recognition (FR) systems pose privacy risks.
One countermeasure is adversarial attack, deceiving unauthorized malicious FR.
We propose the first double identity protection scheme based on traceable adversarial watermarking.
arXiv Detail & Related papers (2024-04-23T02:50:38Z) - DeepEclipse: How to Break White-Box DNN-Watermarking Schemes [60.472676088146436]
We present obfuscation techniques that significantly differ from the existing white-box watermarking removal schemes.
DeepEclipse can evade watermark detection without prior knowledge of the underlying watermarking scheme.
Our evaluation reveals that DeepEclipse excels in breaking multiple white-box watermarking schemes.
arXiv Detail & Related papers (2024-03-06T10:24:47Z) - EmMark: Robust Watermarks for IP Protection of Embedded Quantized Large
Language Models [21.28690053570814]
This paper introduces EmMark, a novel watermarking framework for protecting the intellectual property (IP) of embedded large language models deployed on resource-constrained edge devices.
To address the IP theft risks posed by malicious end-users, EmMark enables proprietors to authenticate ownership by querying the watermarked model weights and matching the inserted signatures.
arXiv Detail & Related papers (2024-02-27T23:30:17Z) - Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection [66.26348985345776]
We propose a novel watermarking method for large language models (LLMs) based on knowledge injection.
In the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge.
In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM.
Experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
arXiv Detail & Related papers (2023-11-16T03:22:53Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - WAN: Watermarking Attack Network [6.763499535329116]
Multi-bit watermarking (MW) has been developed to improve robustness against signal processing operations and geometric distortions.
benchmark tools that test robustness by applying simulated attacks on watermarked images are available.
We propose a watermarking attack network (WAN) that utilizes the weak points of the target MW and induces an inversion of the watermark bit.
arXiv Detail & Related papers (2020-08-14T09:11:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.