Related papers: A Watermark for Low-entropy and Unbiased Generation in Large Language Models

A Watermark for Low-entropy and Unbiased Generation in Large Language Models

URL: http://arxiv.org/abs/2405.14604v1
Date: Thu, 23 May 2024 14:17:29 GMT
Title: A Watermark for Low-entropy and Unbiased Generation in Large Language Models
Authors: Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, Michael Chau,
Abstract summary: Recent advancements in large language models (LLMs) have highlighted the risk of misuse. We propose a novel tradeoff between watermark strength and text quality in unbiased watermarks. Experimental results on low-entropy and high-entropy datasets demonstrate that STA-1 achieves text quality and watermark strength comparable to existing unbiased watermarks.
Score: 6.505831742654826
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in large language models (LLMs) have highlighted the risk of misuse, raising concerns about accurately detecting LLM-generated content. A viable solution for the detection problem is to inject imperceptible identifiers into LLMs, known as watermarks. Previous work demonstrates that unbiased watermarks ensure unforgeability and preserve text quality by maintaining the expectation of the LLM output probability distribution. However, previous unbiased watermarking methods are impractical for local deployment because they rely on accesses to white-box LLMs and input prompts during detection. Moreover, these methods fail to provide statistical guarantees for the type II error of watermark detection. This study proposes the Sampling One Then Accepting (STA-1) method, an unbiased watermark that does not require access to LLMs nor prompts during detection and has statistical guarantees for the type II error. Moreover, we propose a novel tradeoff between watermark strength and text quality in unbiased watermarks. We show that in low-entropy scenarios, unbiased watermarks face a tradeoff between watermark strength and the risk of unsatisfactory outputs. Experimental results on low-entropy and high-entropy datasets demonstrate that STA-1 achieves text quality and watermark strength comparable to existing unbiased watermarks, with a low risk of unsatisfactory outputs. Implementation codes for this study are available online.

Related papers

StealthInk: A Multi-bit and Stealthy Watermark for Large Language Models [4.76514657698929]
StealthInk is a stealthy multi-bit watermarking scheme for large language models (LLMs)<n>It preserves the original text distribution while enabling the embedding of provenance data.<n>We derive a lower bound on the number of tokens necessary for watermark detection at a fixed equal error rate.
arXiv Detail & Related papers (2025-06-05T18:37:38Z)
Watermarking Degrades Alignment in Language Models: Analysis and Mitigation [8.866121740748447]
This paper presents a systematic analysis of how two popular watermarking approaches-Gumbel and KGW-affect truthfulness, safety, and helpfulness.<n>We propose an inference-time sampling method that uses an external reward model to restore alignment.
arXiv Detail & Related papers (2025-06-04T21:29:07Z)
Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking [48.26359966929394]
Invisible Entropy (IE) is a watermarking paradigm designed to enhance both safety and efficiency.<n>IE reduces parameter size by 99% while achieving performance on par with state-of-the-art methods.
arXiv Detail & Related papers (2025-05-20T09:19:06Z)
Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation [58.85645136534301]
Existing watermarking schemes for sampled text often face trade-offs between maintaining text quality and ensuring robust detection against various attacks. We propose a novel watermarking scheme that improves both detectability and text quality by introducing a cumulative watermark entropy threshold.
arXiv Detail & Related papers (2025-04-16T14:16:38Z)
Toward Breaking Watermarks in Distortion-free Large Language Models [11.922206306917435]
We show that it is possible to "compromise" the LLM and carry out a "spoofing" attack. Specifically, we propose a mixed integer linear programming framework that accurately estimates the secret key used for watermarking.
arXiv Detail & Related papers (2025-02-25T19:52:55Z)
GaussMark: A Practical Approach for Structural Watermarking of Language Models [61.84270985214254]
GaussMark is a simple, efficient, and relatively robust scheme for watermarking large language models. We show that GaussMark is reliable, efficient, and relatively robust to corruptions such as insertions, deletions, substitutions, and roundtrip translations.
arXiv Detail & Related papers (2025-01-17T22:30:08Z)
A Watermark for Order-Agnostic Language Models [55.89285889529492]
Pattern-mark is a pattern-based watermarking framework specifically designed for order-agnostic LMs. We develop a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns. Our evaluations on order-agnostic LMs, such as ProteinMPNN and CMLM, demonstrate Pattern-mark's enhanced detection efficiency, generation quality, and robustness.
arXiv Detail & Related papers (2024-10-17T17:41:28Z)
Signal Watermark on Large Language Models [28.711745671275477]
We propose a watermarking method embedding a specific watermark into the text during its generation by Large Language Models (LLMs) This technique not only ensures the watermark's invisibility to humans but also maintains the quality and grammatical integrity of model-generated text. Our method has been empirically validated across multiple LLMs, consistently maintaining high detection accuracy.
arXiv Detail & Related papers (2024-10-09T04:49:03Z)
Can Watermarked LLMs be Identified by Users via Crafted Prompts? [55.460327393792156]
This work is the first to investigate the imperceptibility of watermarked Large Language Models (LLMs) We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts. Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts.
arXiv Detail & Related papers (2024-10-04T06:01:27Z)
Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions [58.777395817878514]
Language model (LM) watermarking techniques inject a statistical signal into LM-generated content. We introduce a new family of distortion-free watermarks--beta-watermark. Experimental results support that the beta-watermark can effectively reduce the distribution bias under key collisions.
arXiv Detail & Related papers (2024-06-02T04:07:32Z)
Black-Box Detection of Language Model Watermarks [1.9374282535132377]
We develop rigorous statistical tests to detect the presence of all three most popular watermarking scheme families using only a limited number of black-box queries. Our findings indicate that current watermarking schemes are more detectable than previously believed, and that obscuring the fact that a watermark was deployed may not be a viable way for providers to protect against adversaries.
arXiv Detail & Related papers (2024-05-28T08:41:30Z)
A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules [27.678152860666163]
We introduce a framework for reasoning about the statistical efficiency of watermarks and powerful detection rules. We derive optimal detection rules for watermarks under our framework.
arXiv Detail & Related papers (2024-04-01T17:03:41Z)
Latent Watermark: Inject and Detect Watermarks in Latent Diffusion Space [7.082806239644562]
Existing methods face the dilemma of image quality and watermark robustness. Watermarks with superior image quality usually have inferior robustness against attacks such as blurring and JPEG compression. We propose Latent Watermark, which injects and detects watermarks in the latent diffusion space.
arXiv Detail & Related papers (2024-03-30T03:19:50Z)
Duwak: Dual Watermarks in Large Language Models [49.00264962860555]
We propose, Duwak, to enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes. We evaluate Duwak extensively on Llama2, against four state-of-the-art watermarking techniques and combinations of them.
arXiv Detail & Related papers (2024-03-12T16:25:38Z)
Towards Optimal Statistical Watermarking [95.46650092476372]
We study statistical watermarking by formulating it as a hypothesis testing problem. Key to our formulation is a coupling of the output tokens and the rejection region. We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting.
arXiv Detail & Related papers (2023-12-13T06:57:00Z)
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models [48.19623266082828]
WaterBench is the first comprehensive benchmark for watermarks in large language models (LLMs) We introduce WaterBench, the first comprehensive benchmark for LLM watermarks, in which we design three crucial factors. We evaluate $4$ open-source watermarks on $2$ LLMs under $2$ watermarking strengths and observe the common struggles for current methods on maintaining the generation quality.
arXiv Detail & Related papers (2023-11-13T08:09:01Z)
Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs. It is possible to integrate watermarks without affecting the output probability distribution. The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z)
Provable Robust Watermarking for AI-Generated Text [41.5510809722375]
We propose a robust and high-quality watermark method, Unigram-Watermark. We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing.
arXiv Detail & Related papers (2023-06-30T07:24:32Z)
On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.