More Haste, Less Speed: Weaker Single-Layer Watermark Improves Distortion-Free Watermark Ensembles
- URL: http://arxiv.org/abs/2602.11793v1
- Date: Thu, 12 Feb 2026 10:18:16 GMT
- Title: More Haste, Less Speed: Weaker Single-Layer Watermark Improves Distortion-Free Watermark Ensembles
- Authors: Ruibo Chen, Yihan Wu, Xuehao Cui, Jingqi Zhang, Heng Huang,
- Abstract summary: We show that strong watermarks significantly reduce the entropy of the token distribution.<n>We propose a framework that utilizes weaker single-layer watermarks to preserve the entropy required for effective multi-layer ensembling.
- Score: 58.941305935872265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Watermarking has emerged as a crucial technique for detecting and attributing content generated by large language models. While recent advancements have utilized watermark ensembles to enhance robustness, prevailing methods typically prioritize maximizing the strength of the watermark at every individual layer. In this work, we identify a critical limitation in this "stronger-is-better" approach: strong watermarks significantly reduce the entropy of the token distribution, which paradoxically weakens the effectiveness of watermarking in subsequent layers. We theoretically and empirically show that detectability is bounded by entropy and that watermark ensembles induce a monotonic decrease in both entropy and the expected green-list ratio across layers. To address this inherent trade-off, we propose a general framework that utilizes weaker single-layer watermarks to preserve the entropy required for effective multi-layer ensembling. Empirical evaluations demonstrate that this counter-intuitive strategy mitigates signal decay and consistently outperforms strong baselines in both detectability and robustness.
Related papers
- Improve the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models [18.988823703120865]
Speculative sampling accelerates inference, with efficiency improving as the acceptance rate increases.<n>Recent work reveals a fundamental trade-off: higher watermark strength reduces acceptance, preventing their simultaneous achievement.<n>We introduce a measure of watermark strength that governs statistical detectability and is maximized when tokens are deterministic functions of pseudorandom numbers.
arXiv Detail & Related papers (2026-02-01T20:30:59Z) - An Ensemble Framework for Unbiased Language Model Watermarking [60.99969104552168]
We propose ENS, a novel ensemble framework that enhances the detectability and robustness of unbiased watermarks.<n>ENS sequentially composes multiple independent watermark instances, each governed by a distinct key, to amplify the watermark signal.<n> Empirical evaluations show that ENS substantially reduces the number of tokens needed for reliable detection and increases resistance to smoothing and paraphrasing attacks.
arXiv Detail & Related papers (2025-09-28T19:37:44Z) - Character-Level Perturbations Disrupt LLM Watermarks [64.60090923837701]
We formalize the system model for Large Language Model (LLM) watermarking.<n>We characterize two realistic threat models constrained on limited access to the watermark detector.<n>We demonstrate character-level perturbations are significantly more effective for watermark removal under the most restrictive threat model.<n> Experiments confirm the superiority of character-level perturbations and the effectiveness of the Genetic Algorithm (GA) in removing watermarks under realistic constraints.
arXiv Detail & Related papers (2025-09-11T02:50:07Z) - Semantic Watermarking Reinvented: Enhancing Robustness and Generation Quality with Fourier Integrity [31.666430190864947]
We propose a novel embedding method called Hermitian Symmetric Fourier Watermarking (SFW)<n>SFW maintains frequency integrity by enforcing Hermitian symmetry.<n>We introduce a center-aware embedding strategy that reduces the vulnerability of semantic watermarking due to cropping attacks.
arXiv Detail & Related papers (2025-09-09T12:15:16Z) - OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization [66.69924980864053]
We propose OptMark, an optimization-based approach that embeds a robust multi-bit watermark into the intermediate latents of the diffusion denoising process.<n> OptMark strategically inserts a structural watermark early to resist generative attacks and a detail watermark late to withstand image transformations.<n> Experimental results demonstrate that OptMark achieves invisible multi-bit watermarking while ensuring robust resilience against valuemetric transformations, geometric transformations, editing, and regeneration attacks.
arXiv Detail & Related papers (2025-08-29T15:50:59Z) - Watermarking Degrades Alignment in Language Models: Analysis and Mitigation [8.866121740748447]
This paper presents a systematic analysis of how two popular watermarking approaches-Gumbel and KGW-affect truthfulness, safety, and helpfulness.<n>We propose an inference-time sampling method that uses an external reward model to restore alignment.
arXiv Detail & Related papers (2025-06-04T21:29:07Z) - MorphMark: Flexible Adaptive Watermarking for Large Language Models [49.3302421751894]
Existing watermark methods often struggle with a dilemma: improving watermark effectiveness comes at the cost of reduced text quality.<n>We develop MorphMark method that adaptively adjusts the watermark strength in response to changes in the identified factor.<n>MorphMark achieves a superior resolution of the effectiveness-quality dilemma, while also offering greater flexibility and time and space efficiency.
arXiv Detail & Related papers (2025-05-14T13:11:16Z) - Duwak: Dual Watermarks in Large Language Models [49.00264962860555]
We propose, Duwak, to enhance the efficiency and quality of watermarking by embedding dual secret patterns in both token probability distribution and sampling schemes.
We evaluate Duwak extensively on Llama2, against four state-of-the-art watermarking techniques and combinations of them.
arXiv Detail & Related papers (2024-03-12T16:25:38Z) - A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models [65.40460716619772]
Our research focuses on the importance of a textbfDistribution-textbfPreserving (DiP) watermark.
Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking.
It is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens.
arXiv Detail & Related papers (2023-10-11T17:57:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.