Related papers: Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code

URL: http://arxiv.org/abs/2401.16820v2
Date: Tue, 16 Apr 2024 02:06:00 GMT
Title: Provably Robust Multi-bit Watermarking for AI-generated Text via Error Correction Code
Authors: Wenjie Qu, Dong Yin, Zixin He, Wei Zou, Tianyang Tao, Jinyuan Jia, Jiaheng Zhang,
Abstract summary: Large Language Models (LLMs) have been widely deployed for their capability to generate texts resembling human language. They could be misused by criminals to create deceptive content, such as fake news and phishing emails. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark into a text.
Score: 39.96262132464419
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Large Language Models (LLMs) have been widely deployed for their remarkable capability to generate texts resembling human language. However, they could be misused by criminals to create deceptive content, such as fake news and phishing emails, which raises ethical concerns. Watermarking is a key technique to mitigate the misuse of LLMs, which embeds a watermark (e.g., a bit string) into a text generated by a LLM. Consequently, this enables the detection of texts generated by a LLM as well as the tracing of generated texts to a specific user. The major limitation of existing watermark techniques is that they cannot accurately or efficiently extract the watermark from a text, especially when the watermark is a long bit string. This key limitation impedes their deployment for real-world applications, e.g., tracing generated texts to a specific user. This work introduces a novel watermarking method for LLM-generated text grounded in \textbf{error-correction codes} to address this challenge. We provide strong theoretical analysis, demonstrating that under bounded adversarial word/token edits (insertion, deletion, and substitution), our method can correctly extract watermarks, offering a provable robustness guarantee. This breakthrough is also evidenced by our extensive experimental results. The experiments show that our method substantially outperforms existing baselines in both accuracy and robustness on benchmark datasets. For instance, when embedding a bit string of length 12 into a 200-token generated text, our approach attains an impressive match rate of $98.4\%$, surpassing the performance of Yoo et al. (state-of-the-art baseline) at $85.6\%$. When subjected to a copy-paste attack involving the injection of 50 tokens to generated texts with 200 words, our method maintains a substantial match rate of $90.8\%$, while the match rate of Yoo et al. diminishes to below $65\%$.

Related papers

DERMARK: A Dynamic, Efficient and Robust Multi-bit Watermark for Large Language Models [18.023143082876015]
We propose DERMARK, a dynamic, efficient, and robust multi-bit watermarking method. DERMARK divides the text into segments of varying lengths for each bit embedding, adaptively matching the text's capacity. It achieves this with negligible overhead and robust performance against text editing by minimizing watermark extraction loss.
arXiv Detail & Related papers (2025-02-04T11:23:49Z)
Segmenting Watermarked Texts From Language Models [1.4103505579327706]
This work focuses on a scenario where an untrusted third-party user sends prompts to a trusted language model (LLM) provider, who then generates a text with a watermark. This setup makes it possible for a detector to later identify the source of the text if the user publishes it. We propose a methodology to segment the published text into watermarked and non-watermarked sub-strings.
arXiv Detail & Related papers (2024-10-28T02:05:10Z)
I Know You Did Not Write That! A Sampling Based Watermarking Method for Identifying Machine Generated Text [0.0]
We propose a new watermarking method to detect machine-generated texts. Our method embeds a unique pattern within the generated text. We show how watermarking affects textual quality and compare our proposed method with a state-of-the-art watermarking method.
arXiv Detail & Related papers (2023-11-29T20:04:57Z)
Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions. This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality. We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z)
Necessary and Sufficient Watermark for Large Language Models [31.933103173481964]
We propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading text quality. We demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods. Especially in machine translation tasks, the NS-Watermark can outperform the existing watermarking method by up to 30 BLEU scores.
arXiv Detail & Related papers (2023-10-02T00:48:51Z)
Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism. Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs. We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z)
On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)
Watermarking Text Generated by Black-Box Language Models [103.52541557216766]
A watermark-based method was proposed for white-box LLMs, allowing them to embed watermarks during text generation. A detection algorithm aware of the list can identify the watermarked text. We develop a watermarking framework for black-box language model usage scenarios.
arXiv Detail & Related papers (2023-05-14T07:37:33Z)
Tracing Text Provenance via Context-Aware Lexical Substitution [81.49359106648735]
We propose a natural language watermarking scheme based on context-aware lexical substitution. Under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences.
arXiv Detail & Related papers (2021-12-15T04:27:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.