Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking
- URL: http://arxiv.org/abs/2505.14112v1
- Date: Tue, 20 May 2025 09:19:06 GMT
- Title: Invisible Entropy: Towards Safe and Efficient Low-Entropy LLM Watermarking
- Authors: Tianle Gu, Zongqi Wang, Kexin Huang, Yuanqi Yao, Xiangliang Zhang, Yujiu Yang, Xiuying Chen,
- Abstract summary: Invisible Entropy (IE) is a watermarking paradigm designed to enhance both safety and efficiency.<n>IE reduces parameter size by 99% while achieving performance on par with state-of-the-art methods.
- Score: 48.26359966929394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Logit-based LLM watermarking traces and verifies AI-generated content by maintaining green and red token lists and increasing the likelihood of green tokens during generation. However, it fails in low-entropy scenarios, where predictable outputs make green token selection difficult without disrupting natural text flow. Existing approaches address this by assuming access to the original LLM to calculate entropy and selectively watermark high-entropy tokens. However, these methods face two major challenges: (1) high computational costs and detection delays due to reliance on the original LLM, and (2) potential risks of model leakage. To address these limitations, we propose Invisible Entropy (IE), a watermarking paradigm designed to enhance both safety and efficiency. Instead of relying on the original LLM, IE introduces a lightweight feature extractor and an entropy tagger to predict whether the entropy of the next token is high or low. Furthermore, based on theoretical analysis, we develop a threshold navigator that adaptively sets entropy thresholds. It identifies a threshold where the watermark ratio decreases as the green token count increases, enhancing the naturalness of the watermarked text and improving detection robustness. Experiments on HumanEval and MBPP datasets demonstrate that IE reduces parameter size by 99\% while achieving performance on par with state-of-the-art methods. Our work introduces a safe and efficient paradigm for low-entropy watermarking. https://github.com/Carol-gutianle/IE https://huggingface.co/datasets/Carol0110/IE-Tagger
Related papers
- HeavyWater and SimplexWater: Watermarking Low-Entropy Text Distributions [9.08291061147965]
Large language model (LLM) watermarks enable authentication of text provenance, curb misuse of machine-generated text, and promote trust in AI systems.<n>LLM watermarking is challenging in low-entropy generation tasks - such as coding - where next-token predictions are near-deterministic.<n>Our goal is to understand how to most effectively use random side information in order to maximize the likelihood of watermark detection and minimize the distortion of generated text.
arXiv Detail & Related papers (2025-06-06T13:52:34Z) - Revealing Weaknesses in Text Watermarking Through Self-Information Rewrite Attacks [36.01146548147208]
Text watermarking algorithms embed watermarks in high-entropy tokens to ensure text quality.<n>In this paper, we reveal that this seemingly benign design can be exploited by attackers, posing a significant risk to the robustness of the watermark.<n>We introduce a generic efficient paraphrasing attack, which leverages the vulnerability by calculating the self-information of each token.
arXiv Detail & Related papers (2025-05-08T12:39:00Z) - Large Language Model Watermark Stealing With Mixed Integer Programming [51.336009662771396]
Large Language Model (LLM) watermark shows promise in addressing copyright, monitoring AI-generated text, and preventing its misuse.
Recent research indicates that watermarking methods using numerous keys are susceptible to removal attacks.
We propose a novel green list stealing attack against the state-of-the-art LLM watermark scheme.
arXiv Detail & Related papers (2024-05-30T04:11:17Z) - Watermarking Low-entropy Generation for Large Language Models: An Unbiased and Low-risk Method [6.505831742654826]
STA-1 is an unbiased watermark that preserves the original token distribution in expectation.<n> Experimental results on low-entropy and high-entropy datasets demonstrate that STA-1 achieves the above properties simultaneously.
arXiv Detail & Related papers (2024-05-23T14:17:29Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - An Entropy-based Text Watermarking Detection Method [41.40123238040657]
The influence of token entropy should be fully considered in the watermark detection process.
We propose textbfEntropy-based Text textbfWatermarking textbfDetection (textbfEWD) that gives higher-entropy tokens higher influence weights during watermark detection.
arXiv Detail & Related papers (2024-03-20T10:40:01Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - Who Wrote this Code? Watermarking for Code Generation [53.24895162874416]
We propose Selective WatErmarking via Entropy Thresholding (SWEET) to detect machine-generated text.
Our experiments show that SWEET significantly improves code quality preservation while outperforming all baselines.
arXiv Detail & Related papers (2023-05-24T11:49:52Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.