Publicly-Detectable Watermarking for Language Models
- URL: http://arxiv.org/abs/2310.18491v4
- Date: Sat, 04 Jan 2025 13:52:49 GMT
- Title: Publicly-Detectable Watermarking for Language Models
- Authors: Jaiden Fairoze, Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Mingyuan Wang,
- Abstract summary: We present a publicly-detectable watermarking scheme for LMs.<n>We embed a cryptographic signature into LM output using rejection sampling.<n>We prove that this produces unforgeable and distortion-free text output.
- Score: 45.32236917886154
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a publicly-detectable watermarking scheme for LMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LM output using rejection sampling and prove that this produces unforgeable and distortion-free (i.e., undetectable without access to the public key) text output. We make use of error-correction to overcome periods of low entropy, a barrier for all prior watermarking schemes. We implement our scheme and find that our formal claims are met in practice.
Related papers
- Multi-use LLM Watermarking and the False Detection Problem [12.954387412283973]
Digital watermarking is a promising solution for mitigating some of the risks arising from the misuse of automatically generated text.<n>However, simultaneously using the same embedding for both detection and user identification leads to a false detection problem.<n>We propose Dual Watermarking which jointly encodes detection and identification watermarks into generated text.
arXiv Detail & Related papers (2025-06-19T02:37:02Z) - LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps [3.9287497907611875]
Given a text, can we determine whether it was generated by a large language model (LLM) or by a human?<n>We propose an undetectable and watermarking scheme in the closed setting.<n>Also, in the harder open setting, where the adversary has access to most of the model, we propose an unremovable watermarking scheme.
arXiv Detail & Related papers (2025-05-02T16:36:43Z) - Provably Robust Watermarks for Open-Source Language Models [5.509756888700397]
We introduce the first watermarking scheme for open-source language models.
Our scheme works by modifying the parameters of the model, but the watermark can be detected from just the outputs of the model.
Perhaps surprisingly, we prove that our watermarks are unremovable under certain assumptions about the adversary's knowledge.
arXiv Detail & Related papers (2024-10-24T15:44:34Z) - Command-line Obfuscation Detection using Small Language Models [0.7373617024876725]
adversaries often use command-line obfuscation to avoid detection.
We have implemented a scalable NLP-based detection method that leverages a custom-trained, small transformer language model.
We show the model's superiority to signatures on established malware and showcase previously unseen obfuscated samples detected by our model.
arXiv Detail & Related papers (2024-08-05T17:01:33Z) - Large Language Model Watermark Stealing With Mixed Integer Programming [51.336009662771396]
Large Language Model (LLM) watermark shows promise in addressing copyright, monitoring AI-generated text, and preventing its misuse.
Recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks.
We propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme.
arXiv Detail & Related papers (2024-05-30T04:11:17Z) - Black-Box Detection of Language Model Watermarks [1.9374282535132377]
We develop rigorous statistical tests to detect, and estimate parameters, of all three popular watermarking scheme families.
We experimentally confirm the effectiveness of our methods on a range of schemes and a diverse set of open-source models.
Our findings indicate that current watermarking schemes are more detectable than previously believed.
arXiv Detail & Related papers (2024-05-28T08:41:30Z) - Watermarking Language Models for Many Adaptive Users [47.90822587139056]
We study watermarking schemes for language models with provable guarantees.
We introduce multi-user watermarks, which allow tracing model-generated text to individual users.
We prove that the undetectable zero-bit scheme of Christ, Gunn, and Zamir (2024) is adaptively robust.
arXiv Detail & Related papers (2024-05-17T22:15:30Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules [27.678152860666163]
We introduce a framework for reasoning about the statistical efficiency of watermarks and powerful detection rules.
We derive optimal detection rules for watermarks under our framework.
arXiv Detail & Related papers (2024-04-01T17:03:41Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - Undetectable Watermarks for Language Models [1.347733333991357]
We introduce a cryptographically-inspired notion of undetectable watermarks for language models.
watermarks can be detected only with the knowledge of a secret key.
We construct undetectable watermarks based on the existence of one-way functions.
arXiv Detail & Related papers (2023-05-25T02:57:16Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z) - Can AI-Generated Text be Reliably Detected? [54.670136179857344]
Unregulated use of LLMs can potentially lead to malicious consequences such as plagiarism, generating fake news, spamming, etc.
Recent works attempt to tackle this problem either using certain model signatures present in the generated text outputs or by applying watermarking techniques.
In this paper, we show that these detectors are not reliable in practical scenarios.
arXiv Detail & Related papers (2023-03-17T17:53:19Z) - Protecting Language Generation Models via Invisible Watermarking [41.532711376512744]
We propose GINSEW, a novel method to protect text generation models from being stolen through distillation.
Experimental results show that GINSEW can effectively identify instances of IP infringement with minimal impact on the generation quality of protected APIs.
arXiv Detail & Related papers (2023-02-06T23:42:03Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.