Mark My Words: Analyzing and Evaluating Language Model Watermarks
- URL: http://arxiv.org/abs/2312.00273v2
- Date: Thu, 7 Dec 2023 04:37:47 GMT
- Title: Mark My Words: Analyzing and Evaluating Language Model Watermarks
- Authors: Julien Piet, Chawin Sitawarin, Vivian Fang, Norman Mu, David Wagner
- Abstract summary: This work focuses on text watermarking techniques - as opposed to image watermarks - and proposes MARKMYWORDS.
We focus on three main metrics: quality, size (e.g. the number of tokens needed to detect a watermark), and tamper-resistance.
We argue that watermark indistinguishability, a criteria emphasized in some prior works, is too strong a requirement.
- Score: 8.610361087746718
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: The capabilities of large language models have grown significantly in recent
years and so too have concerns about their misuse. In this context, the ability
to distinguish machine-generated text from human-authored content becomes
important. Prior works have proposed numerous schemes to watermark text, which
would benefit from a systematic evaluation framework. This work focuses on text
watermarking techniques - as opposed to image watermarks - and proposes
MARKMYWORDS, a comprehensive benchmark for them under different tasks as well
as practical attacks. We focus on three main metrics: quality, size (e.g. the
number of tokens needed to detect a watermark), and tamper-resistance. Current
watermarking techniques are good enough to be deployed: Kirchenbauer et al. [1]
can watermark Llama2-7B-chat with no perceivable loss in quality, the watermark
can be detected with fewer than 100 tokens, and the scheme offers good
tamper-resistance to simple attacks. We argue that watermark
indistinguishability, a criteria emphasized in some prior works, is too strong
a requirement: schemes that slightly modify logit distributions outperform
their indistinguishable counterparts with no noticeable loss in generation
quality. We publicly release our benchmark
(https://github.com/wagner-group/MarkMyWords)
Related papers
- Improved Unbiased Watermark for Large Language Models [59.00698153097887]
We introduce MCmark, a family of unbiased, Multi-Channel-based watermarks.
MCmark preserves the original distribution of the language model.
It offers significant improvements in detectability and robustness over existing unbiased watermarks.
arXiv Detail & Related papers (2025-02-16T21:02:36Z) - Revisiting the Robustness of Watermarking to Paraphrasing Attacks [10.68370011459729]
Many recent watermarking techniques modify the output probabilities of LMs to embed a signal in the generated output that can later be detected.
We show that with access to only a limited number of generations from a black-box watermarked model, we can drastically increase the effectiveness of paraphrasing attacks to evade watermark detection.
arXiv Detail & Related papers (2024-11-08T02:22:30Z) - De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively.
Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z) - Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality [27.592486717044455]
We present a novel type of watermark, Sparse Watermark, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text.
Our experimental results demonstrate that the proposed watermarking scheme achieves high detectability while generating text that outperforms previous watermarking methods in quality across various tasks.
arXiv Detail & Related papers (2024-07-17T18:52:12Z) - Watermarking Language Models for Many Adaptive Users [47.90822587139056]
We study watermarking schemes for language models with provable guarantees.
We introduce multi-user watermarks, which allow tracing model-generated text to individual users.
We prove that the undetectable zero-bit scheme of Christ, Gunn, and Zamir (2024) is adaptively robust.
arXiv Detail & Related papers (2024-05-17T22:15:30Z) - GumbelSoft: Diversified Language Model Watermarking via the GumbelMax-trick [50.35069175236422]
Large language models (LLMs) excellently generate human-like text, but also raise concerns about misuse in fake news and academic dishonesty.
Decoding-based watermark, particularly the GumbelMax-trick-based watermark(GM watermark), is a standout solution for safeguarding machine-generated texts.
We propose a new type of GM watermark, the Logits-Addition watermark, and its three variants, specifically designed to enhance diversity.
arXiv Detail & Related papers (2024-02-20T12:05:47Z) - On the Learnability of Watermarks for Language Models [80.97358663708592]
We ask whether language models can directly learn to generate watermarked text.
We propose watermark distillation, which trains a student model to behave like a teacher model.
We find that models can learn to generate watermarked text with high detectability.
arXiv Detail & Related papers (2023-12-07T17:41:44Z) - Improving the Generation Quality of Watermarked Large Language Models
via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions.
This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality.
We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.