How Good is Post-Hoc Watermarking With Language Model Rephrasing?
- URL: http://arxiv.org/abs/2512.16904v1
- Date: Thu, 18 Dec 2025 18:57:33 GMT
- Title: How Good is Post-Hoc Watermarking With Language Model Rephrasing?
- Authors: Pierre Fernandez, Tom Sander, Hady Elsahar, Hongyan Chang, Tomáš Souček, Valeriu Lacatusu, Tuan Tran, Sylvestre-Alvise Rebuffi, Alexandre Mourachko,
- Abstract summary: Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content.<n>We explore post-hoc watermarking where an LLM rewrites existing text while applying generation-time watermarking.<n>Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books.
- Score: 43.5649433230903
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generation-time text watermarking embeds statistical signals into text for traceability of AI-generated content. We explore *post-hoc watermarking* where an LLM rewrites existing text while applying generation-time watermarking, to protect copyrighted documents, or detect their use in training or RAG via watermark radioactivity. Unlike generation-time approaches, which is constrained by how LLMs are served, this setting offers additional degrees of freedom for both generation and detection. We investigate how allocating compute (through larger rephrasing models, beam search, multi-candidate generation, or entropy filtering at detection) affects the quality-detectability trade-off. Our strategies achieve strong detectability and semantic fidelity on open-ended text such as books. Among our findings, the simple Gumbel-max scheme surprisingly outperforms more recent alternatives under nucleus sampling, and most methods benefit significantly from beam search. However, most approaches struggle when watermarking verifiable text such as code, where we counterintuitively find that smaller models outperform larger ones. This study reveals both the potential and limitations of post-hoc watermarking, laying groundwork for practical applications and future research.
Related papers
- In-Context Watermarks for Large Language Models [71.29952527565749]
In-Context Watermarking (ICW) embeds watermarks into generated text solely through prompt engineering.<n>We investigate four ICW strategies at different levels of granularity, each paired with a tailored detection method.<n>Our experiments validate the feasibility of ICW as a model-agnostic, practical watermarking approach.
arXiv Detail & Related papers (2025-05-22T17:24:51Z) - Entropy-Guided Watermarking for LLMs: A Test-Time Framework for Robust and Traceable Text Generation [58.85645136534301]
Existing watermarking schemes for sampled text often face trade-offs between maintaining text quality and ensuring robust detection against various attacks.<n>We propose a novel watermarking scheme that improves both detectability and text quality by introducing a cumulative watermark entropy threshold.
arXiv Detail & Related papers (2025-04-16T14:16:38Z) - GaussMark: A Practical Approach for Structural Watermarking of Language Models [61.84270985214254]
GaussMark is a simple, efficient, and relatively robust scheme for watermarking large language models.<n>We show that GaussMark is reliable, efficient, and relatively robust to corruptions such as insertions, deletions, substitutions, and roundtrip translations.
arXiv Detail & Related papers (2025-01-17T22:30:08Z) - Signal Watermark on Large Language Models [28.711745671275477]
We propose a watermarking method embedding a specific watermark into the text during its generation by Large Language Models (LLMs)
This technique not only ensures the watermark's invisibility to humans but also maintains the quality and grammatical integrity of model-generated text.
Our method has been empirically validated across multiple LLMs, consistently maintaining high detection accuracy.
arXiv Detail & Related papers (2024-10-09T04:49:03Z) - Black-Box Detection of Language Model Watermarks [1.9374282535132377]
We develop rigorous statistical tests to detect, and estimate parameters, of all three popular watermarking scheme families.<n>We experimentally confirm the effectiveness of our methods on a range of schemes and a diverse set of open-source models.<n>Our findings indicate that current watermarking schemes are more detectable than previously believed.
arXiv Detail & Related papers (2024-05-28T08:41:30Z) - Token-Specific Watermarking with Enhanced Detectability and Semantic Coherence for Large Language Models [31.062753031312006]
Large language models generate high-quality responses with potential misinformation.
Watermarking is pivotal in this context, which involves embedding hidden markers in texts.
We introduce a novel multi-objective optimization (MOO) approach for watermarking.
Our method simultaneously achieves detectability and semantic integrity.
arXiv Detail & Related papers (2024-02-28T05:43:22Z) - Provably Robust Multi-bit Watermarking for AI-generated Text [37.21416140194606]
Large Language Models (LLMs) have demonstrated remarkable capabilities of generating texts resembling human language.<n>They can be misused by criminals to create deceptive content, such as fake news and phishing emails.<n> Watermarking is a key technique to address these concerns, which embeds a message into a text.
arXiv Detail & Related papers (2024-01-30T08:46:48Z) - Downstream Trade-offs of a Family of Text Watermarks [25.408313192999504]
We evaluate the performance of LLMs watermarked using three different strategies over a diverse suite of tasks.
We find that watermarks can cause significant drops in LLMs' effective utility across all tasks.
Our findings highlight the trade-offs that users should be cognizant of when using watermarked models.
arXiv Detail & Related papers (2023-11-16T11:44:58Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.