Related papers: Yet Another Watermark for Large Language Models

Yet Another Watermark for Large Language Models

URL: http://arxiv.org/abs/2509.12574v2
Date: Wed, 17 Sep 2025 03:13:53 GMT
Title: Yet Another Watermark for Large Language Models
Authors: Siyuan Bao, Ying Shi, Zhiguang Yang, Hanzhou Wu, Xinpeng Zhang,
Abstract summary: Existing watermarking methods for large language models (LLMs) embed watermark by adjusting the token sampling prediction or post-processing.<n>We present a new watermarking framework for LLMs, where the watermark is embedded into the LLM by manipulating the internal parameters of the LLM.<n>The proposed method entangles the watermark with the intrinsic parameters of the LLM, which better balances the robustness and imperceptibility of the watermark.
Score: 20.295405732813748
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Existing watermarking methods for large language models (LLMs) mainly embed watermark by adjusting the token sampling prediction or post-processing, lacking intrinsic coupling with LLMs, which may significantly reduce the semantic quality of the generated marked texts. Traditional watermarking methods based on training or fine-tuning may be extendable to LLMs. However, most of them are limited to the white-box scenario, or very time-consuming due to the massive parameters of LLMs. In this paper, we present a new watermarking framework for LLMs, where the watermark is embedded into the LLM by manipulating the internal parameters of the LLM, and can be extracted from the generated text without accessing the LLM. Comparing with related methods, the proposed method entangles the watermark with the intrinsic parameters of the LLM, which better balances the robustness and imperceptibility of the watermark. Moreover, the proposed method enables us to extract the watermark under the black-box scenario, which is computationally efficient for use. Experimental results have also verified the feasibility, superiority and practicality. This work provides a new perspective different from mainstream works, which may shed light on future research.

Related papers

Detecting Post-generation Edits to Watermarked LLM Outputs via Combinatorial Watermarking [51.417096446156926]
We introduce a new task: detecting post-generation edits locally made to watermarked LLM outputs.<n>We propose a pattern-based watermarking framework, which partitions the vocabulary into disjoint subsets and embeds the watermark.<n>We evaluate our method on open-source LLMs across a variety of editing scenarios, demonstrating strong empirical performance in edit localization.
arXiv Detail & Related papers (2025-10-02T03:33:12Z)
Mark Your LLM: Detecting the Misuse of Open-Source Large Language Models via Watermarking [40.951792492059646]
This work defines two misuse scenarios for open-source large language models (LLMs)<n>We explore the application of inference-time watermark distillation and backdoor watermarking in these contexts.<n>Our experiments reveal that backdoor watermarking could effectively detect IP Violation, while inference-time watermark distillation is applicable in both scenarios.
arXiv Detail & Related papers (2025-03-06T17:24:06Z)
SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models [4.069844339028727]
SimMark is a robust sentence-level watermarking algorithm for large language models (LLMs)<n>It embeds detectable statistical patterns imperceptible to humans, and employs a soft counting mechanism.<n>We show that SimMark sets a new benchmark for robust watermarking of LLM-generated content.
arXiv Detail & Related papers (2025-02-05T00:21:01Z)
Can Watermarked LLMs be Identified by Users via Crafted Prompts? [55.460327393792156]
This work is the first to investigate the imperceptibility of watermarked Large Language Models (LLMs)<n>We design an identification algorithm called Water-Probe that detects watermarks through well-designed prompts.<n> Experiments show that almost all mainstream watermarking algorithms are easily identified with our well-designed prompts.
arXiv Detail & Related papers (2024-10-04T06:01:27Z)
Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality [27.592486717044455]
We present a novel type of watermark, Sparse Watermark, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text. Our experimental results demonstrate that the proposed watermarking scheme achieves high detectability while generating text that outperforms previous watermarking methods in quality across various tasks.
arXiv Detail & Related papers (2024-07-17T18:52:12Z)
Waterfall: Framework for Robust and Scalable Text Watermarking and Provenance for LLMs [36.068335914828396]
We propose Waterfall, the first training-free framework for robust and scalable text watermarking. Waterfall achieves significantly better scalability, robust verifiability, and computational efficiency compared to SOTA article-text watermarking methods.
arXiv Detail & Related papers (2024-07-05T10:51:33Z)
MarkLLM: An Open-Source Toolkit for LLM Watermarking [80.00466284110269]
MarkLLM is an open-source toolkit for implementing LLM watermarking algorithms. For evaluation, MarkLLM offers a comprehensive suite of 12 tools spanning three perspectives, along with two types of automated evaluation pipelines.
arXiv Detail & Related papers (2024-05-16T12:40:01Z)
Topic-Based Watermarks for Large Language Models [46.71493672772134]
We propose a lightweight, topic-guided watermarking scheme for Large Language Model (LLM) output.<n>Our method achieves comparable perplexity to industry-leading systems, including Google's SynthID-Text.
arXiv Detail & Related papers (2024-04-02T17:49:40Z)
Learning to Watermark LLM-generated Text via Reinforcement Learning [16.61005372279407]
We study how to watermark LLM outputs to track misuse. We design a model-level watermark that embeds signals into the output. We propose a co-training framework based on reinforcement learning.
arXiv Detail & Related papers (2024-03-13T03:43:39Z)
WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens. We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z)
Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection [66.26348985345776]
We propose a novel watermarking method for large language models (LLMs) based on knowledge injection. In the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge. In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM. Experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
arXiv Detail & Related papers (2023-11-16T03:22:53Z)
A Robust Semantics-based Watermark for Large Language Model against Paraphrasing [50.84892876636013]
Large language models (LLMs) have show great ability in various natural language tasks. There are concerns that LLMs are possible to be used improperly or even illegally. We propose a semantics-based watermark framework SemaMark.
arXiv Detail & Related papers (2023-11-15T06:19:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.