Multi-use LLM Watermarking and the False Detection Problem
- URL: http://arxiv.org/abs/2506.15975v1
- Date: Thu, 19 Jun 2025 02:37:02 GMT
- Title: Multi-use LLM Watermarking and the False Detection Problem
- Authors: Zihao Fu, Chris Russell,
- Abstract summary: Digital watermarking is a promising solution for mitigating some of the risks arising from the misuse of automatically generated text.<n>However, simultaneously using the same embedding for both detection and user identification leads to a false detection problem.<n>We propose Dual Watermarking which jointly encodes detection and identification watermarks into generated text.
- Score: 12.954387412283973
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Digital watermarking is a promising solution for mitigating some of the risks arising from the misuse of automatically generated text. These approaches either embed non-specific watermarks to allow for the detection of any text generated by a particular sampler, or embed specific keys that allow the identification of the LLM user. However, simultaneously using the same embedding for both detection and user identification leads to a false detection problem, whereby, as user capacity grows, unwatermarked text is increasingly likely to be falsely detected as watermarked. Through theoretical analysis, we identify the underlying causes of this phenomenon. Building on these insights, we propose Dual Watermarking which jointly encodes detection and identification watermarks into generated text, significantly reducing false positives while maintaining high detection accuracy. Our experimental results validate our theoretical findings and demonstrate the effectiveness of our approach.
Related papers
- A Nested Watermark for Large Language Models [6.702383792532788]
Large language models (LLMs) can be misused to generate fake news and misinformation.<n>We propose a novel nested watermarking scheme that embeds two distinct watermarks into the generated text.<n>Our method achieves high detection accuracy for both watermarks while maintaining the fluency and overall quality of the generated text.
arXiv Detail & Related papers (2025-06-18T05:49:05Z) - Modification and Generated-Text Detection: Achieving Dual Detection Capabilities for the Outputs of LLM by Watermark [6.355836060419373]
One practical solution is to embed a watermark in the text, allowing ownership verification through watermark extraction.<n>Existing methods primarily focus on defending against modification attacks, often neglecting other spoofing attacks.<n>We propose a technique to detect modifications in text for unbiased watermark which is sensitive to modification.
arXiv Detail & Related papers (2025-02-12T11:56:40Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - WaterSeeker: Pioneering Efficient Detection of Watermarked Segments in Large Documents [63.563031923075066]
WaterSeeker is a novel approach to efficiently detect and locate watermarked segments amid extensive natural text.<n>It achieves a superior balance between detection accuracy and computational efficiency.
arXiv Detail & Related papers (2024-09-08T14:45:47Z) - Duwak: Dual Watermarks in Large Language Models [49.00264962860555]
We propose, Duwak, to enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes.
We evaluate Duwak extensively on Llama2, against four state-of-the-art watermarking techniques and combinations of them.
arXiv Detail & Related papers (2024-03-12T16:25:38Z) - WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens.
We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - SepMark: Deep Separable Watermarking for Unified Source Tracing and
Deepfake Detection [15.54035395750232]
Malicious Deepfakes have led to a sharp conflict over distinguishing between genuine and forged faces.
We propose SepMark, which provides a unified framework for source tracing and Deepfake detection.
arXiv Detail & Related papers (2023-05-10T17:15:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.