Undetectable Watermarks for Language Models
- URL: http://arxiv.org/abs/2306.09194v1
- Date: Thu, 25 May 2023 02:57:16 GMT
- Title: Undetectable Watermarks for Language Models
- Authors: Miranda Christ, Sam Gunn, Or Zamir
- Abstract summary: We introduce a cryptographically-inspired notion of undetectable watermarks for language models.
watermarks can be detected only with the knowledge of a secret key.
We construct undetectable watermarks based on the existence of one-way functions.
- Score: 1.347733333991357
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent advances in the capabilities of large language models such as GPT-4
have spurred increasing concern about our ability to detect AI-generated text.
Prior works have suggested methods of embedding watermarks in model outputs, by
noticeably altering the output distribution. We ask: Is it possible to
introduce a watermark without incurring any detectable change to the output
distribution?
To this end we introduce a cryptographically-inspired notion of undetectable
watermarks for language models. That is, watermarks can be detected only with
the knowledge of a secret key; without the secret key, it is computationally
intractable to distinguish watermarked outputs from those of the original
model. In particular, it is impossible for a user to observe any degradation in
the quality of the text. Crucially, watermarks should remain undetectable even
when the user is allowed to adaptively query the model with arbitrarily chosen
prompts. We construct undetectable watermarks based on the existence of one-way
functions, a standard assumption in cryptography.
Related papers
- Watermarking Language Models for Many Adaptive Users [47.90822587139056]
We study watermarking schemes for language models with provable guarantees.
We introduce multi-user watermarks, which allow tracing model-generated text to individual users.
We prove that the undetectable zero-bit scheme of Christ, Gunn, and Zamir (2024) is adaptively robust.
arXiv Detail & Related papers (2024-05-17T22:15:30Z) - Multi-Bit Distortion-Free Watermarking for Large Language Models [4.7381853007029475]
We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark.
We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.
arXiv Detail & Related papers (2024-02-26T14:01:34Z) - On the Learnability of Watermarks for Language Models [80.97358663708592]
We ask whether language models can directly learn to generate watermarked text.
We propose watermark distillation, which trains a student model to behave like a teacher model.
We find that models can learn to generate watermarked text with high detectability.
arXiv Detail & Related papers (2023-12-07T17:41:44Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - Tree-Ring Watermarks: Fingerprints for Diffusion Images that are
Invisible and Robust [55.91987293510401]
Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content.
We introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs.
Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed.
arXiv Detail & Related papers (2023-05-31T17:00:31Z) - Watermarking Text Generated by Black-Box Language Models [103.52541557216766]
A watermark-based method was proposed for white-box LLMs, allowing them to embed watermarks during text generation.
A detection algorithm aware of the list can identify the watermarked text.
We develop a watermarking framework for black-box language model usage scenarios.
arXiv Detail & Related papers (2023-05-14T07:37:33Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.