LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps
- URL: http://arxiv.org/abs/2505.01484v2
- Date: Tue, 24 Jun 2025 18:37:32 GMT
- Title: LLM Watermarking Using Mixtures and Statistical-to-Computational Gaps
- Authors: Pedro Abdalla, Roman Vershynin,
- Abstract summary: Given a text, can we determine whether it was generated by a large language model (LLM) or by a human?<n>We propose an undetectable and watermarking scheme in the closed setting.<n>Also, in the harder open setting, where the adversary has access to most of the model, we propose an unremovable watermarking scheme.
- Score: 3.9287497907611875
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Given a text, can we determine whether it was generated by a large language model (LLM) or by a human? A widely studied approach to this problem is watermarking. We propose an undetectable and elementary watermarking scheme in the closed setting. Also, in the harder open setting, where the adversary has access to most of the model, we propose an unremovable watermarking scheme.
Related papers
- Optimized Couplings for Watermarking Large Language Models [8.585779208433465]
Large-language models (LLMs) are now able to produce text that is, in many cases, seemingly indistinguishable from human-generated content.<n>This paper provides an analysis of text watermarking in a one-shot setting.
arXiv Detail & Related papers (2025-05-13T18:08:12Z) - Toward Breaking Watermarks in Distortion-free Large Language Models [11.922206306917435]
We show that it is possible to "compromise" the LLM and carry out a "spoofing" attack.<n>Specifically, we propose a mixed integer linear programming framework that accurately estimates the secret key used for watermarking.
arXiv Detail & Related papers (2025-02-25T19:52:55Z) - De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively.<n>Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z) - Large Language Model Watermark Stealing With Mixed Integer Programming [51.336009662771396]
Large Language Model (LLM) watermark shows promise in addressing copyright, monitoring AI-generated text, and preventing its misuse.
Recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks.
We propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme.
arXiv Detail & Related papers (2024-05-30T04:11:17Z) - Black-Box Detection of Language Model Watermarks [1.9374282535132377]
We develop rigorous statistical tests to detect, and estimate parameters, of all three popular watermarking scheme families.<n>We experimentally confirm the effectiveness of our methods on a range of schemes and a diverse set of open-source models.<n>Our findings indicate that current watermarking schemes are more detectable than previously believed.
arXiv Detail & Related papers (2024-05-28T08:41:30Z) - Publicly-Detectable Watermarking for Language Models [45.32236917886154]
We present a publicly-detectable watermarking scheme for LMs.<n>We embed a cryptographic signature into LM output using rejection sampling.<n>We prove that this produces unforgeable and distortion-free text output.
arXiv Detail & Related papers (2023-10-27T21:08:51Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - Undetectable Watermarks for Language Models [1.347733333991357]
We introduce a cryptographically-inspired notion of undetectable watermarks for language models.
watermarks can be detected only with the knowledge of a secret key.
We construct undetectable watermarks based on the existence of one-way functions.
arXiv Detail & Related papers (2023-05-25T02:57:16Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.