A Watermark for Low-entropy and Unbiased Generation in Large Language Models
- URL: http://arxiv.org/abs/2405.14604v2
- Date: Tue, 15 Oct 2024 20:14:50 GMT
- Title: A Watermark for Low-entropy and Unbiased Generation in Large Language Models
- Authors: Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, Michael Chau,
- Abstract summary: This study proposes the Sampling One Then Accepting (STA-1) method, a watermark that can address all of these issues.
We show that in low-entropy scenarios, unbiased watermarks face a tradeoff between watermark strength and the risk of unsatisfactory outputs.
Experimental results on both low-entropy and high-entropy datasets demonstrate that STA-1 achieves text quality and watermark strength comparable to existing unbiased watermarks.
- Score: 6.505831742654826
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advancements in large language models (LLMs) have highlighted the risk of misusing them, raising the need for accurate detection of LLM-generated content. In response, a viable solution is to inject imperceptible identifiers into LLMs, known as watermarks. Previous work demonstrates that unbiased watermarks ensure unforgeability and preserve text quality by maintaining the expectation of the LLM output probability distribution. However, previous unbiased watermarking methods suffer from one or more of the following issues: (1) requiring access to white-box LLMs during detection, (2) incurring long detection time, (3) being not robust against simple watermarking attacks, (4) failing to provide statistical guarantees for the type II error of watermark detection, and (5) being not statistically unbiased for low-entropy scenarios, which hinder their deployment in practice. This study proposes the Sampling One Then Accepting (STA-1) method, a watermark that can address all of these issues. Moreover, we discuss the tradeoff between watermark strength and text quality for unbiased watermarks. We show that in low-entropy scenarios, unbiased watermarks face a tradeoff between watermark strength and the risk of unsatisfactory outputs. Experimental results on both low-entropy and high-entropy datasets demonstrate that STA-1 achieves text quality and watermark strength comparable to existing unbiased watermarks, with a low risk of unsatisfactory outputs. Implementation codes for this study are available online.
Related papers
- Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation [58.85645136534301]
Existing watermarking schemes for sampled text often face trade-offs between maintaining text quality and ensuring robust detection against various attacks.
We propose a novel watermarking scheme that improves both detectability and text quality by introducing a cumulative watermark entropy threshold.
arXiv Detail & Related papers (2025-04-16T14:16:38Z) - Toward Breaking Watermarks in Distortion-free Large Language Models [11.922206306917435]
We show that it is possible to "compromise" the LLM and carry out a "spoofing" attack.
Specifically, we propose a mixed integer linear programming framework that accurately estimates the secret key used for watermarking.
arXiv Detail & Related papers (2025-02-25T19:52:55Z) - GaussMark: A Practical Approach for Structural Watermarking of Language Models [61.84270985214254]
GaussMark is a simple, efficient, and relatively robust scheme for watermarking large language models.
We show that GaussMark is reliable, efficient, and relatively robust to corruptions such as insertions, deletions, substitutions, and roundtrip translations.
arXiv Detail & Related papers (2025-01-17T22:30:08Z) - A Watermark for Order-Agnostic Language Models [55.89285889529492]
Pattern-mark is a pattern-based watermarking framework specifically designed for order-agnostic LMs.
We develop a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns.
Our evaluations on order-agnostic LMs, such as ProteinMPNN and CMLM, demonstrate Pattern-mark's enhanced detection efficiency, generation quality, and robustness.
arXiv Detail & Related papers (2024-10-17T17:41:28Z) - Signal Watermark on Large Language Models [28.711745671275477]
We propose a watermarking method embedding a specific watermark into the text during its generation by Large Language Models (LLMs)
This technique not only ensures the watermark's invisibility to humans but also maintains the quality and grammatical integrity of model-generated text.
Our method has been empirically validated across multiple LLMs, consistently maintaining high detection accuracy.
arXiv Detail & Related papers (2024-10-09T04:49:03Z) - Can Watermarked LLMs be Identified by Users via Crafted Prompts? [55.460327393792156]
This work is the first to investigate the imperceptibility of watermarked Large Language Models (LLMs)
We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts.
Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts.
arXiv Detail & Related papers (2024-10-04T06:01:27Z) - Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions [58.777395817878514]
Language model (LM) watermarking techniques inject a statistical signal into LM-generated content.
We introduce a new family of distortion-free watermarks--beta-watermark.
Experimental results support that the beta-watermark can effectively reduce the distribution bias under key collisions.
arXiv Detail & Related papers (2024-06-02T04:07:32Z) - Black-Box Detection of Language Model Watermarks [1.9374282535132377]
We develop rigorous statistical tests to detect the presence of all three most popular watermarking scheme families using only a limited number of black-box queries.
Our findings indicate that current watermarking schemes are more detectable than previously believed, and that obscuring the fact that a watermark was deployed may not be a viable way for providers to protect against adversaries.
arXiv Detail & Related papers (2024-05-28T08:41:30Z) - A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules [27.678152860666163]
We introduce a framework for reasoning about the statistical efficiency of watermarks and powerful detection rules.
We derive optimal detection rules for watermarks under our framework.
arXiv Detail & Related papers (2024-04-01T17:03:41Z) - Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space [7.082806239644562]
Existing methods face the dilemma of image quality and watermark robustness.
Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression.
We propose Latent Watermark, which injects and detects watermarks in the latent diffusion space.
arXiv Detail & Related papers (2024-03-30T03:19:50Z) - Duwak: Dual Watermarks in Large Language Models [49.00264962860555]
We propose, Duwak, to enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes.
We evaluate Duwak extensively on Llama2, against four state-of-the-art watermarking techniques and combinations of them.
arXiv Detail & Related papers (2024-03-12T16:25:38Z) - Towards Optimal Statistical Watermarking [95.46650092476372]
We study statistical watermarking by formulating it as a hypothesis testing problem.
Key to our formulation is a coupling of the output tokens and the rejection region.
We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting.
arXiv Detail & Related papers (2023-12-13T06:57:00Z) - WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models [48.19623266082828]
WaterBench is the first comprehensive benchmark for watermarks in large language models (LLMs)
We introduce WaterBench, the first comprehensive benchmark for LLM watermarks, in which we design three crucial factors.
We evaluate $4$ open-source watermarks on $2$ LLMs under $2$ watermarking strengths and observe the common struggles for current methods on maintaining the generation quality.
arXiv Detail & Related papers (2023-11-13T08:09:01Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - Provable Robust Watermarking for AI-Generated Text [41.5510809722375]
We propose a robust and high-quality watermark method, Unigram-Watermark.
We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing.
arXiv Detail & Related papers (2023-06-30T07:24:32Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.