Related papers: Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection

Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection

URL: http://arxiv.org/abs/2311.09535v3
Date: Wed, 24 Jul 2024 05:23:10 GMT
Title: Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection
Authors: Shuai Li, Kejiang Chen, Kunsheng Tang, Jie Zhang, Weiming Zhang, Nenghai Yu, Kai Zeng,
Abstract summary: We propose a novel watermarking method for large language models (LLMs) based on knowledge injection. In the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge. In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM. Experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
Score: 66.26348985345776
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language models (LLMs) have demonstrated outstanding performance, making them valuable digital assets with significant commercial potential. Unfortunately, the LLM and its API are susceptible to intellectual property theft. Watermarking is a classic solution for copyright verification. However, most recent emerging LLM watermarking methods focus on identifying AI-generated texts rather than watermarking LLM itself. Only a few attempts are based on weight quantification and backdoor watermarking, which are not robust or covert enough, limiting their applicability in practice. To address this issue, we propose a novel watermarking method for LLMs based on knowledge injection and innovatively use knowledge as the watermark carrier. Specifically, in the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge, subsequently injected into the to-be-protected LLM. In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM and extracting the watermarks from its response. The experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.

Related papers

Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking [40.951792492059646]
This work defines two misuse scenarios for open-source large language models (LLMs) We explore the application of inference-time watermark distillation and backdoor watermarking in these contexts. Our experiments reveal that backdoor watermarking could effectively detect IP Violation, while inference-time watermark distillation is applicable in both scenarios.
arXiv Detail & Related papers (2025-03-06T17:24:06Z)
Toward Breaking Watermarks in Distortion-free Large Language Models [11.922206306917435]
We show that it is possible to "compromise" the LLM and carry out a "spoofing" attack. Specifically, we propose a mixed integer linear programming framework that accurately estimates the secret key used for watermarking.
arXiv Detail & Related papers (2025-02-25T19:52:55Z)
Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? [75.99961894619986]
This paper investigates whether student models can acquire the capabilities of teacher models through knowledge distillation while avoiding watermark inheritance.<n>We propose two categories of watermark removal approaches: pre-distillation removal through untargeted and targeted training data paraphrasing (UP and TP), and post-distillation removal through inference-time watermark neutralization (WN)
arXiv Detail & Related papers (2025-02-17T09:34:19Z)
Watermarking Large Language Models and the Generated Content: Opportunities and Challenges [18.01886375229288]
generative large language models (LLMs) have raised concerns about intellectual property rights violations and the spread of machine-generated misinformation. Watermarking serves as a promising approch to establish ownership, prevent unauthorized use, and trace the origins of LLM-generated content. This paper summarizes and shares the challenges and opportunities we found when watermarking LLMs.
arXiv Detail & Related papers (2024-10-24T18:55:33Z)
ESpeW: Robust Copyright Protection for LLM-based EaaS via Embedding-Specific Watermark [50.08021440235581]
Embeds as a Service (Eding) is emerging as a crucial role in AI applications. Eding is vulnerable to model extraction attacks, highlighting the urgent need for copyright protection. We propose a novel embedding-specific watermarking (ESpeW) mechanism to offer robust copyright protection for Eding.
arXiv Detail & Related papers (2024-10-23T04:34:49Z)
De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively. Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z)
Can Watermarked LLMs be Identified by Users via Crafted Prompts? [55.460327393792156]
This work is the first to investigate the imperceptibility of watermarked Large Language Models (LLMs) We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts. Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts.
arXiv Detail & Related papers (2024-10-04T06:01:27Z)
Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality [27.592486717044455]
We present a novel type of watermark, Sparse Watermark, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text. Our experimental results demonstrate that the proposed watermarking scheme achieves high detectability while generating text that outperforms previous watermarking methods in quality across various tasks.
arXiv Detail & Related papers (2024-07-17T18:52:12Z)
New Evaluation Metrics Capture Quality Degradation due to LLM Watermarking [28.53032132891346]
We introduce two new easy-to-use methods for evaluating watermarking algorithms for large-language models (LLMs) Our experiments, conducted across various datasets, reveal that current watermarking methods are detectable by even simple classifiers. Our findings underscore the trade-off between watermark robustness and text quality and highlight the importance of having more informative metrics to assess watermarking quality.
arXiv Detail & Related papers (2023-12-04T22:56:31Z)
WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens. We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z)
Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs. It is possible to integrate watermarks without affecting the output probability distribution. The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z)
Provable Robust Watermarking for AI-Generated Text [41.5510809722375]
We propose a robust and high-quality watermark method, Unigram-Watermark. We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing.
arXiv Detail & Related papers (2023-06-30T07:24:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.