Related papers: An Unforgeable Publicly Verifiable Watermark for Large Language Models

An Unforgeable Publicly Verifiable Watermark for Large Language Models

URL: http://arxiv.org/abs/2307.16230v7
Date: Sun, 26 May 2024 05:22:38 GMT
Title: An Unforgeable Publicly Verifiable Watermark for Large Language Models
Authors: Aiwei Liu, Leyi Pan, Xuming Hu, Shu'ang Li, Lijie Wen, Irwin King, Philip S. Yu,
Abstract summary: Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
Score: 84.2805275589553
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, text watermarking algorithms for large language models (LLMs) have been proposed to mitigate the potential harms of text generated by LLMs, including fake news and copyright issues. However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks. Subsequent analysis confirms the high complexity involved in forging the watermark from the detection network. Our code is available at \href{https://github.com/THU-BPM/unforgeable_watermark}{https://github.com/THU-BPM/unforgeable\_watermark}. Additionally, our algorithm could also be accessed through MarkLLM \citep{pan2024markllm} \footnote{https://github.com/THU-BPM/MarkLLM}.

Related papers

A Nested Watermark for Large Language Models [6.702383792532788]
Large language models (LLMs) can be misused to generate fake news and misinformation.<n>We propose a novel nested watermarking scheme that embeds two distinct watermarks into the generated text.<n>Our method achieves high detection accuracy for both watermarks while maintaining the fluency and overall quality of the generated text.
arXiv Detail & Related papers (2025-06-18T05:49:05Z)
WaterSeeker: Pioneering Efficient Detection of Watermarked Segments in Large Documents [65.11018806214388]
WaterSeeker is a novel approach to efficiently detect and locate watermarked segments amid extensive natural text. It achieves a superior balance between detection accuracy and computational efficiency. WaterSeeker's localization ability supports the development of interpretable AI detection systems.
arXiv Detail & Related papers (2024-09-08T14:45:47Z)
Large Language Model Watermark Stealing With Mixed Integer Programming [51.336009662771396]
Large Language Model (LLM) watermark shows promise in addressing copyright, monitoring AI-generated text, and preventing its misuse. Recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks. We propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme.
arXiv Detail & Related papers (2024-05-30T04:11:17Z)
Is The Watermarking Of LLM-Generated Code Robust? [5.48277165801539]
We show that watermarking techniques are significantly more fragile in code-based contexts. Specifically, we show that simple semantic-preserving transformations, such as variable renaming and dead code insertion, can effectively erase watermarks.
arXiv Detail & Related papers (2024-03-24T21:41:29Z)
An Entropy-based Text Watermarking Detection Method [41.40123238040657]
The influence of token entropy should be fully considered in the watermark detection process. We propose textbfEntropy-based Text textbfWatermarking textbfDetection (textbfEWD) that gives higher-entropy tokens higher influence weights during watermark detection.
arXiv Detail & Related papers (2024-03-20T10:40:01Z)
Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models [31.062753031312006]
Large language models generate high-quality responses with potential misinformation. Watermarking is pivotal in this context, which involves embedding hidden markers in texts. We introduce a novel multi-objective optimization (MOO) approach for watermarking. Our method simultaneously achieves detectability and semantic integrity.
arXiv Detail & Related papers (2024-02-28T05:43:22Z)
A Semantic Invariant Robust Watermark for Large Language Models [27.522264953691746]
Prior watermark algorithms face a trade-off between attack robustness and security robustness. This is because the watermark logits for a token are determined by a certain number of preceding tokens. We propose a semantic invariant watermarking method for LLMs that provides both attack robustness and security robustness.
arXiv Detail & Related papers (2023-10-10T06:49:43Z)
On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)
Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text. Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z)
Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc. Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques. In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z)
A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality. It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
Reversible Watermarking in Deep Convolutional Neural Networks for Integrity Authentication [78.165255859254]
We propose a reversible watermarking algorithm for integrity authentication. The influence of embedding reversible watermarking on the classification performance is less than 0.5%. At the same time, the integrity of the model can be verified by applying the reversible watermarking.
arXiv Detail & Related papers (2021-04-09T09:32:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.