Watermarking Low-entropy Generation for Large Language Models: An Unbiased and Low-risk Method
- URL: http://arxiv.org/abs/2405.14604v3
- Date: Fri, 07 Feb 2025 21:04:06 GMT
- Title: Watermarking Low-entropy Generation for Large Language Models: An Unbiased and Low-risk Method
- Authors: Minjia Mao, Dongjun Wei, Zeyu Chen, Xiao Fang, Michael Chau,
- Abstract summary: STA-1 is an unbiased watermark that preserves the original token distribution in expectation.
Experimental results on low-entropy and high-entropy datasets demonstrate that STA-1 achieves the above properties simultaneously.
- Score: 6.505831742654826
- License:
- Abstract: Recent advancements in large language models (LLMs) have highlighted the risk of misusing them, raising the need for accurate detection of LLM-generated content. In response, a viable solution is to inject imperceptible identifiers into LLMs, known as watermarks. Our research extends the existing watermarking methods by proposing the novel Sampling One Then Accepting (STA-1) method. STA-1 is an unbiased watermark that preserves the original token distribution in expectation and has a lower risk of producing unsatisfactory outputs in low-entropy scenarios compared to existing unbiased watermarks. In watermark detection, STA-1 does not require prompts or a white-box LLM, provides statistical guarantees, demonstrates high efficiency in detection time, and remains robust against various watermarking attacks. Experimental results on low-entropy and high-entropy datasets demonstrate that STA-1 achieves the above properties simultaneously, making it a desirable solution for watermarking LLMs. Implementation codes for this study are available online.
Related papers
- BiMarker: Enhancing Text Watermark Detection for Large Language Models with Bipolar Watermarks [19.689433249830465]
Existing watermarking techniques struggle with low watermark strength and stringent false-positive requirements.
tool splits generated text into positive and negative poles, enhancing detection without requiring additional computational resources.
arXiv Detail & Related papers (2025-01-21T14:32:50Z) - GaussMark: A Practical Approach for Structural Watermarking of Language Models [61.84270985214254]
GaussMark is a simple, efficient, and relatively robust scheme for watermarking large language models.
We show that GaussMark is reliable, efficient, and relatively robust to corruptions such as insertions, deletions, substitutions, and roundtrip translations.
arXiv Detail & Related papers (2025-01-17T22:30:08Z) - Provably Robust Watermarks for Open-Source Language Models [5.509756888700397]
We introduce the first watermarking scheme for open-source language models.
Our scheme works by modifying the parameters of the model, but the watermark can be detected from just the outputs of the model.
Perhaps surprisingly, we prove that our watermarks are unremovable under certain assumptions about the adversary's knowledge.
arXiv Detail & Related papers (2024-10-24T15:44:34Z) - A Watermark for Order-Agnostic Language Models [55.89285889529492]
Pattern-mark is a pattern-based watermarking framework specifically designed for order-agnostic LMs.
We develop a Markov-chain-based watermark generator that produces watermark key sequences with high-frequency key patterns.
Our evaluations on order-agnostic LMs, such as ProteinMPNN and CMLM, demonstrate Pattern-mark's enhanced detection efficiency, generation quality, and robustness.
arXiv Detail & Related papers (2024-10-17T17:41:28Z) - Signal Watermark on Large Language Models [28.711745671275477]
We propose a watermarking method embedding a specific watermark into the text during its generation by Large Language Models (LLMs)
This technique not only ensures the watermark's invisibility to humans but also maintains the quality and grammatical integrity of model-generated text.
Our method has been empirically validated across multiple LLMs, consistently maintaining high detection accuracy.
arXiv Detail & Related papers (2024-10-09T04:49:03Z) - Can Watermarked LLMs be Identified by Users via Crafted Prompts? [55.460327393792156]
This work is the first to investigate the imperceptibility of watermarked Large Language Models (LLMs)
We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts.
Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts.
arXiv Detail & Related papers (2024-10-04T06:01:27Z) - A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules [27.678152860666163]
We introduce a framework for reasoning about the statistical efficiency of watermarks and powerful detection rules.
We derive optimal detection rules for watermarks under our framework.
arXiv Detail & Related papers (2024-04-01T17:03:41Z) - Towards Optimal Statistical Watermarking [95.46650092476372]
We study statistical watermarking by formulating it as a hypothesis testing problem.
Key to our formulation is a coupling of the output tokens and the rejection region.
We characterize the Uniformly Most Powerful (UMP) watermark in the general hypothesis testing setting.
arXiv Detail & Related papers (2023-12-13T06:57:00Z) - Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection [66.26348985345776]
We propose a novel watermarking method for large language models (LLMs) based on knowledge injection.
In the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge.
In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM.
Experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
arXiv Detail & Related papers (2023-11-16T03:22:53Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - Provable Robust Watermarking for AI-Generated Text [41.5510809722375]
We propose a robust and high-quality watermark method, Unigram-Watermark.
We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing.
arXiv Detail & Related papers (2023-06-30T07:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.