Watermarking Text Generated by Black-Box Language Models
- URL: http://arxiv.org/abs/2305.08883v1
- Date: Sun, 14 May 2023 07:37:33 GMT
- Title: Watermarking Text Generated by Black-Box Language Models
- Authors: Xi Yang, Kejiang Chen, Weiming Zhang, Chang Liu, Yuang Qi, Jie Zhang,
Han Fang, Nenghai Yu
- Abstract summary: A watermark-based method was proposed for white-box LLMs, allowing them to embed watermarks during text generation.
A detection algorithm aware of the list can identify the watermarked text.
We develop a watermarking framework for black-box language model usage scenarios.
- Score: 103.52541557216766
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LLMs now exhibit human-like skills in various fields, leading to worries
about misuse. Thus, detecting generated text is crucial. However, passive
detection methods are stuck in domain specificity and limited adversarial
robustness. To achieve reliable detection, a watermark-based method was
proposed for white-box LLMs, allowing them to embed watermarks during text
generation. The method involves randomly dividing the model vocabulary to
obtain a special list and adjusting the probability distribution to promote the
selection of words in the list. A detection algorithm aware of the list can
identify the watermarked text. However, this method is not applicable in many
real-world scenarios where only black-box language models are available. For
instance, third-parties that develop API-based vertical applications cannot
watermark text themselves because API providers only supply generated text and
withhold probability distributions to shield their commercial interests. To
allow third-parties to autonomously inject watermarks into generated text, we
develop a watermarking framework for black-box language model usage scenarios.
Specifically, we first define a binary encoding function to compute a random
binary encoding corresponding to a word. The encodings computed for
non-watermarked text conform to a Bernoulli distribution, wherein the
probability of a word representing bit-1 being approximately 0.5. To inject a
watermark, we alter the distribution by selectively replacing words
representing bit-0 with context-based synonyms that represent bit-1. A
statistical test is then used to identify the watermark. Experiments
demonstrate the effectiveness of our method on both Chinese and English
datasets. Furthermore, results under re-translation, polishing, word deletion,
and synonym substitution attacks reveal that it is arduous to remove the
watermark without compromising the original semantics.
Related papers
- Segmenting Watermarked Texts From Language Models [1.4103505579327706]
This work focuses on a scenario where an untrusted third-party user sends prompts to a trusted language model (LLM) provider, who then generates a text with a watermark.
This setup makes it possible for a detector to later identify the source of the text if the user publishes it.
We propose a methodology to segment the published text into watermarked and non-watermarked sub-strings.
arXiv Detail & Related papers (2024-10-28T02:05:10Z) - Multi-Bit Distortion-Free Watermarking for Large Language Models [4.7381853007029475]
We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark.
We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.
arXiv Detail & Related papers (2024-02-26T14:01:34Z) - Improving the Generation Quality of Watermarked Large Language Models
via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions.
This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality.
We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z) - A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models [65.40460716619772]
Our research focuses on the importance of a textbfDistribution-textbfPreserving (DiP) watermark.
Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking.
It is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens.
arXiv Detail & Related papers (2023-10-11T17:57:35Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - Undetectable Watermarks for Language Models [1.347733333991357]
We introduce a cryptographically-inspired notion of undetectable watermarks for language models.
watermarks can be detected only with the knowledge of a secret key.
We construct undetectable watermarks based on the existence of one-way functions.
arXiv Detail & Related papers (2023-05-25T02:57:16Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z) - Tracing Text Provenance via Context-Aware Lexical Substitution [81.49359106648735]
We propose a natural language watermarking scheme based on context-aware lexical substitution.
Under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences.
arXiv Detail & Related papers (2021-12-15T04:27:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.