Related papers: Unforgeable Watermarks for Language Models via Robust Signatures

Unforgeable Watermarks for Language Models via Robust Signatures

URL: http://arxiv.org/abs/2602.15323v1
Date: Tue, 17 Feb 2026 03:09:06 GMT
Title: Unforgeable Watermarks for Language Models via Robust Signatures
Authors: Huijia Lin, Kameron Shahabi, Min Jae Song,
Abstract summary: We introduce two novel guarantees: unforgeability and recoverability.<n>We construct the first undetectable watermarking scheme that is robust, unforgeable, and recoverable.
Score: 12.643204293013007
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Language models now routinely produce text that is difficult to distinguish from human writing, raising the need for robust tools to verify content provenance. Watermarking has emerged as a promising countermeasure, with existing work largely focused on model quality preservation and robust detection. However, current schemes provide limited protection against false attribution. We strengthen the notion of soundness by introducing two novel guarantees: unforgeability and recoverability. Unforgeability prevents adversaries from crafting false positives, texts that are far from any output from the watermarked model but are nonetheless flagged as watermarked. Recoverability provides an additional layer of protection: whenever a watermark is detected, the detector identifies the source text from which the flagged content was derived. Together, these properties strengthen content ownership by linking content exclusively to its generating model, enabling secure attribution and fine-grained traceability. We construct the first undetectable watermarking scheme that is robust, unforgeable, and recoverable with respect to substitutions (i.e., perturbations in Hamming metric). The key technical ingredient is a new cryptographic primitive called robust (or recoverable) digital signatures, which allow verification of messages that are close to signed ones, while preventing forgery of messages that are far from all previously signed messages. We show that any standard digital signature scheme can be boosted to a robust one using property-preserving hash functions (Boyle, LaVigne, and Vaikuntanathan, ITCS 2019).

Related papers

SimKey: A Semantically Aware Key Module for Watermarking Language Models [19.115617392855768]
We introduce SimKey, a semantic key module that strengthens watermark robustness.<n>SimKey uses locality-sensitive hashing over semantic embeddings to ensure that paraphrased text yields the same watermark key.<n>It improves watermark robustness to paraphrasing and translation while preventing harmful content from false attribution.
arXiv Detail & Related papers (2025-10-11T20:07:54Z)
LLM Watermark Evasion via Bias Inversion [24.543675977310357]
We propose the emphBias-Inversion Rewriting Attack (BIRA), which is theoretically motivated and model-agnostic.<n>BIRA weakens the watermark signal by suppressing the logits of likely watermarked tokens during rewriting, without any knowledge of the underlying watermarking scheme.
arXiv Detail & Related papers (2025-09-27T00:24:57Z)
A Nested Watermark for Large Language Models [6.702383792532788]
Large language models (LLMs) can be misused to generate fake news and misinformation.<n>We propose a novel nested watermarking scheme that embeds two distinct watermarks into the generated text.<n>Our method achieves high detection accuracy for both watermarks while maintaining the fluency and overall quality of the generated text.
arXiv Detail & Related papers (2025-06-18T05:49:05Z)
GaussMark: A Practical Approach for Structural Watermarking of Language Models [61.84270985214254]
GaussMark is a simple, efficient, and relatively robust scheme for watermarking large language models.<n>We show that GaussMark is reliable, efficient, and relatively robust to corruptions such as insertions, deletions, substitutions, and roundtrip translations.
arXiv Detail & Related papers (2025-01-17T22:30:08Z)
Let Watermarks Speak: A Robust and Unforgeable Watermark for Language Models [0.0]
We propose an undetectable, robust, single-bit watermarking scheme.<n>It has a comparable robustness to the most advanced zero-bit watermarking schemes.
arXiv Detail & Related papers (2024-12-27T11:58:05Z)
RoboSignature: Robust Signature and Watermarking on Network Attacks [0.5461938536945723]
We present a novel adversarial fine-tuning attack that disrupts the model's ability to embed the intended watermark.<n>Our findings emphasize the importance of anticipating and defending against potential vulnerabilities in generative systems.
arXiv Detail & Related papers (2024-12-22T04:36:27Z)
Watermarking Language Models for Many Adaptive Users [47.90822587139056]
We study watermarking schemes for language models with provable guarantees. We introduce multi-user watermarks, which allow tracing model-generated text to individual users. We prove that the undetectable zero-bit scheme of Christ, Gunn, and Zamir (2024) is adaptively robust.
arXiv Detail & Related papers (2024-05-17T22:15:30Z)
Improving the Generation Quality of Watermarked Large Language Models via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions. This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality. We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z)
A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models [65.40460716619772]
Our research focuses on the importance of a textbfDistribution-textbfPreserving (DiP) watermark. Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking. It is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens.
arXiv Detail & Related papers (2023-10-11T17:57:35Z)
T2IW: Joint Text to Image & Watermark Generation [74.20148555503127]
We introduce a novel task for the joint generation of text to image and watermark (T2IW) This T2IW scheme ensures minimal damage to image quality when generating a compound image by forcing the semantic feature and the watermark signal to be compatible in pixels. We demonstrate remarkable achievements in image quality, watermark invisibility, and watermark robustness, supported by our proposed set of evaluation metrics.
arXiv Detail & Related papers (2023-09-07T16:12:06Z)
On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document. We find that watermarks remain detectable even after human and machine paraphrasing. We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)
Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text. Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.