Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Large Reasoning Models
- URL: http://arxiv.org/abs/2601.05144v1
- Date: Thu, 08 Jan 2026 17:32:22 GMT
- Title: Distilling the Thought, Watermarking the Answer: A Principle Semantic Guided Watermark for Large Reasoning Models
- Authors: Shuliang Liu, Xingyu Li, Hongyi Liu, Yibo Yan, Bingchen Duan, Qi Zheng, Dong Fang, Lingfeng Su, Xuming Hu,
- Abstract summary: This paper introduces ReasonMark, a novel watermarking framework specifically designed for reasoning-intensive LLMs.<n>Our approach decouples generation into an undisturbed Thinking Phase and a watermarked Answering Phase.<n>Experiments show ReasonMark surpasses state-of-the-art methods by reducing text Perplexity by 0.35, increasing translation BLEU score by 0.164, and raising mathematical accuracy by 0.67 points.
- Score: 46.12198035083885
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Reasoning Large Language Models (RLLMs) excelling in complex tasks present unique challenges for digital watermarking, as existing methods often disrupt logical coherence or incur high computational costs. Token-based watermarking techniques can corrupt the reasoning flow by applying pseudo-random biases, while semantic-aware approaches improve quality but introduce significant latency or require auxiliary models. This paper introduces ReasonMark, a novel watermarking framework specifically designed for reasoning-intensive LLMs. Our approach decouples generation into an undisturbed Thinking Phase and a watermarked Answering Phase. We propose a Criticality Score to identify semantically pivotal tokens from the reasoning trace, which are distilled into a Principal Semantic Vector (PSV). The PSV then guides a semantically-adaptive mechanism that modulates watermark strength based on token-PSV alignment, ensuring robustness without compromising logical integrity. Extensive experiments show ReasonMark surpasses state-of-the-art methods by reducing text Perplexity by 0.35, increasing translation BLEU score by 0.164, and raising mathematical accuracy by 0.67 points. These advancements are achieved alongside a 0.34% higher watermark detection AUC and stronger robustness to attacks, all with a negligible increase in latency. This work enables the traceable and trustworthy deployment of reasoning LLMs in real-world applications.
Related papers
- More Haste, Less Speed: Weaker Single-Layer Watermark Improves Distortion-Free Watermark Ensembles [58.941305935872265]
We show that strong watermarks significantly reduce the entropy of the token distribution.<n>We propose a framework that utilizes weaker single-layer watermarks to preserve the entropy required for effective multi-layer ensembling.
arXiv Detail & Related papers (2026-02-12T10:18:16Z) - ALIEN: Analytic Latent Watermarking for Controllable Generation [16.064060838471924]
We propose an underlineAnaunderlinelytical Watermarkunderlineing Framework for Controllablunderlinee Generatiounderlinen (ALIEN)<n>We develop the first analytical derivation of the time-dependent modulation coefficient that guides the diffusion of watermark residuals to achieve controllable watermark embedding pattern.<n>Results show that ALIEN-Q outperforms the state-of-the-art by 33.1% across 5 quality metrics, and ALIEN-R demonstrates 14.0% improved robustness against generative variant and stability
arXiv Detail & Related papers (2026-02-05T16:04:27Z) - Improve the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models [18.988823703120865]
Speculative sampling accelerates inference, with efficiency improving as the acceptance rate increases.<n>Recent work reveals a fundamental trade-off: higher watermark strength reduces acceptance, preventing their simultaneous achievement.<n>We introduce a measure of watermark strength that governs statistical detectability and is maximized when tokens are deterministic functions of pseudorandom numbers.
arXiv Detail & Related papers (2026-02-01T20:30:59Z) - An Ensemble Framework for Unbiased Language Model Watermarking [60.99969104552168]
We propose ENS, a novel ensemble framework that enhances the detectability and robustness of unbiased watermarks.<n>ENS sequentially composes multiple independent watermark instances, each governed by a distinct key, to amplify the watermark signal.<n> Empirical evaluations show that ENS substantially reduces the number of tokens needed for reliable detection and increases resistance to smoothing and paraphrasing attacks.
arXiv Detail & Related papers (2025-09-28T19:37:44Z) - Character-Level Perturbations Disrupt LLM Watermarks [64.60090923837701]
We formalize the system model for Large Language Model (LLM) watermarking.<n>We characterize two realistic threat models constrained on limited access to the watermark detector.<n>We demonstrate character-level perturbations are significantly more effective for watermark removal under the most restrictive threat model.<n> Experiments confirm the superiority of character-level perturbations and the effectiveness of the Genetic Algorithm (GA) in removing watermarks under realistic constraints.
arXiv Detail & Related papers (2025-09-11T02:50:07Z) - MorphMark: Flexible Adaptive Watermarking for Large Language Models [49.3302421751894]
Existing watermark methods often struggle with a dilemma: improving watermark effectiveness comes at the cost of reduced text quality.<n>We develop MorphMark method that adaptively adjusts the watermark strength in response to changes in the identified factor.<n>MorphMark achieves a superior resolution of the effectiveness-quality dilemma, while also offering greater flexibility and time and space efficiency.
arXiv Detail & Related papers (2025-05-14T13:11:16Z) - Theoretically Grounded Framework for LLM Watermarking: A Distribution-Adaptive Approach [53.32564762183639]
We introduce a novel, unified theoretical framework for watermarking Large Language Models (LLMs)<n>Our approach aims to maximize detection performance while maintaining control over the worst-case false positive rate (FPR) and distortion on text quality.<n>We propose a distortion-free, distribution-adaptive watermarking algorithm (DAWA) that leverages a surrogate model for model-agnosticism and efficiency.
arXiv Detail & Related papers (2024-10-03T18:28:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.