PostMark: A Robust Blackbox Watermark for Large Language Models
- URL: http://arxiv.org/abs/2406.14517v2
- Date: Fri, 11 Oct 2024 16:19:55 GMT
- Title: PostMark: A Robust Blackbox Watermark for Large Language Models
- Authors: Yapei Chang, Kalpesh Krishna, Amir Houmansadr, John Wieting, Mohit Iyyer,
- Abstract summary: We develop PostMark, a modular post-hoc watermarking procedure.
PostMark does not require logit access, which means it can be implemented by a third party.
We show that PostMark is more robust to paraphrasing attacks than existing watermarking methods.
- Score: 56.63560134428716
- License:
- Abstract: The most effective techniques to detect LLM-generated text rely on inserting a detectable signature -- or watermark -- during the model's decoding process. Most existing watermarking methods require access to the underlying LLM's logits, which LLM API providers are loath to share due to fears of model distillation. As such, these watermarks must be implemented independently by each LLM provider. In this paper, we develop PostMark, a modular post-hoc watermarking procedure in which an input-dependent set of words (determined via a semantic embedding) is inserted into the text after the decoding process has completed. Critically, PostMark does not require logit access, which means it can be implemented by a third party. We also show that PostMark is more robust to paraphrasing attacks than existing watermarking methods: our experiments cover eight baseline algorithms, five base LLMs, and three datasets. Finally, we evaluate the impact of PostMark on text quality using both automated and human assessments, highlighting the trade-off between quality and robustness to paraphrasing. We release our code, outputs, and annotations at https://github.com/lilakk/PostMark.
Related papers
- Can Watermarked LLMs be Identified by Users via Crafted Prompts? [55.460327393792156]
This work is the first to investigate the imperceptibility of watermarked Large Language Models (LLMs)
We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts.
Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts.
arXiv Detail & Related papers (2024-10-04T06:01:27Z) - Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality [27.592486717044455]
We present a novel type of watermark, Sparse Watermark, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text.
Our experimental results demonstrate that the proposed watermarking scheme achieves high detectability while generating text that outperforms previous watermarking methods in quality across various tasks.
arXiv Detail & Related papers (2024-07-17T18:52:12Z) - MarkLLM: An Open-Source Toolkit for LLM Watermarking [80.00466284110269]
MarkLLM is an open-source toolkit for implementing LLM watermarking algorithms.
For evaluation, MarkLLM offers a comprehensive suite of 12 tools spanning three perspectives, along with two types of automated evaluation pipelines.
arXiv Detail & Related papers (2024-05-16T12:40:01Z) - Learning to Watermark LLM-generated Text via Reinforcement Learning [16.61005372279407]
We study how to watermark LLM outputs to track misuse.
We design a model-level watermark that embeds signals into the output.
We propose a co-training framework based on reinforcement learning.
arXiv Detail & Related papers (2024-03-13T03:43:39Z) - Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection [66.26348985345776]
We propose a novel watermarking method for large language models (LLMs) based on knowledge injection.
In the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge.
In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM.
Experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
arXiv Detail & Related papers (2023-11-16T03:22:53Z) - A Robust Semantics-based Watermark for Large Language Model against Paraphrasing [50.84892876636013]
Large language models (LLMs) have show great ability in various natural language tasks.
There are concerns that LLMs are possible to be used improperly or even illegally.
We propose a semantics-based watermark framework SemaMark.
arXiv Detail & Related papers (2023-11-15T06:19:02Z) - A Semantic Invariant Robust Watermark for Large Language Models [27.522264953691746]
Prior watermark algorithms face a trade-off between attack robustness and security robustness.
This is because the watermark logits for a token are determined by a certain number of preceding tokens.
We propose a semantic invariant watermarking method for LLMs that provides both attack robustness and security robustness.
arXiv Detail & Related papers (2023-10-10T06:49:43Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - Provable Robust Watermarking for AI-Generated Text [41.5510809722375]
We propose a robust and high-quality watermark method, Unigram-Watermark.
We prove that our watermark method enjoys guaranteed generation quality, correctness in watermark detection, and is robust against text editing and paraphrasing.
arXiv Detail & Related papers (2023-06-30T07:24:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.