Watermarking Language Models with Error Correcting Codes
- URL: http://arxiv.org/abs/2406.10281v1
- Date: Wed, 12 Jun 2024 05:13:09 GMT
- Title: Watermarking Language Models with Error Correcting Codes
- Authors: Patrick Chao, Edgar Dobriban, Hamed Hassani,
- Abstract summary: We propose a watermarking framework that encodes statistical signals through an error correcting code.
Our method, termed robust binary code (RBC) watermark, introduces no distortion compared to the original probability distribution.
Our empirical findings suggest our watermark is fast, powerful, and robust, comparing favorably to the state-of-the-art.
- Score: 41.21656847672627
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent progress in large language models enables the creation of realistic machine-generated content. Watermarking is a promising approach to distinguish machine-generated text from human text, embedding statistical signals in the output that are ideally undetectable to humans. We propose a watermarking framework that encodes such signals through an error correcting code. Our method, termed robust binary code (RBC) watermark, introduces no distortion compared to the original probability distribution, and no noticeable degradation in quality. We evaluate our watermark on base and instruction fine-tuned models and find our watermark is robust to edits, deletions, and translations. We provide an information-theoretic perspective on watermarking, a powerful statistical test for detection and for generating p-values, and theoretical guarantees. Our empirical findings suggest our watermark is fast, powerful, and robust, comparing favorably to the state-of-the-art.
Related papers
- A Certified Robust Watermark For Large Language Models [14.944271622556778]
We propose the first certified robust watermark algorithm for large language models based on randomized smoothing.
Our algorithm can derive substantial certified robustness, which means that our watermark can not be removed even under significant alterations.
arXiv Detail & Related papers (2024-09-29T13:51:15Z) - Watermark Smoothing Attacks against Language Models [40.02225709485305]
We introduce smoothing attacks and show that existing watermarking methods are not robust against minor modifications of text.
Our attack reveals a fundamental limitation of a wide range of watermarking techniques.
arXiv Detail & Related papers (2024-07-19T11:04:54Z) - Duwak: Dual Watermarks in Large Language Models [49.00264962860555]
We propose, Duwak, to enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes.
We evaluate Duwak extensively on Llama2, against four state-of-the-art watermarking techniques and combinations of them.
arXiv Detail & Related papers (2024-03-12T16:25:38Z) - Multi-Bit Distortion-Free Watermarking for Large Language Models [4.7381853007029475]
We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark.
We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.
arXiv Detail & Related papers (2024-02-26T14:01:34Z) - On the Learnability of Watermarks for Language Models [80.97358663708592]
We ask whether language models can directly learn to generate watermarked text.
We propose watermark distillation, which trains a student model to behave like a teacher model.
We find that models can learn to generate watermarked text with high detectability.
arXiv Detail & Related papers (2023-12-07T17:41:44Z) - I Know You Did Not Write That! A Sampling Based Watermarking Method for
Identifying Machine Generated Text [0.0]
We propose a new watermarking method to detect machine-generated texts.
Our method embeds a unique pattern within the generated text.
We show how watermarking affects textual quality and compare our proposed method with a state-of-the-art watermarking method.
arXiv Detail & Related papers (2023-11-29T20:04:57Z) - Improving the Generation Quality of Watermarked Large Language Models
via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions.
This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality.
We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.