PVMark: Enabling Public Verifiability for LLM Watermarking Schemes
- URL: http://arxiv.org/abs/2510.26274v1
- Date: Thu, 30 Oct 2025 08:58:44 GMT
- Title: PVMark: Enabling Public Verifiability for LLM Watermarking Schemes
- Authors: Haohua Duan, Liyao Xiang, Xin Zhang,
- Abstract summary: We propose PVMark, a plugin based on zero-knowledge proof (ZKP)<n> PVMark enables the watermark detection process to be publicly verifiable by third parties without disclosing any secret key.<n>We implement multiple variants of PVMark in Python, Rust and Circom, covering combinations of three watermarking schemes, three hash functions, and four ZKP protocols.
- Score: 12.082226177179187
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Watermarking schemes for large language models (LLMs) have been proposed to identify the source of the generated text, mitigating the potential threats emerged from model theft. However, current watermarking solutions hardly resolve the trust issue: the non-public watermark detection cannot prove itself faithfully conducting the detection. We observe that it is attributed to the secret key mostly used in the watermark detection -- it cannot be public, or the adversary may launch removal attacks provided the key; nor can it be private, or the watermarking detection is opaque to the public. To resolve the dilemma, we propose PVMark, a plugin based on zero-knowledge proof (ZKP), enabling the watermark detection process to be publicly verifiable by third parties without disclosing any secret key. PVMark hinges upon the proof of `correct execution' of watermark detection on which a set of ZKP constraints are built, including mapping, random number generation, comparison, and summation. We implement multiple variants of PVMark in Python, Rust and Circom, covering combinations of three watermarking schemes, three hash functions, and four ZKP protocols, to show our approach effectively works under a variety of circumstances. By experimental results, PVMark efficiently enables public verifiability on the state-of-the-art LLM watermarking schemes yet without compromising the watermarking performance, promising to be deployed in practice.
Related papers
- CLUE-MARK: Watermarking Diffusion Models using CLWE [8.429227679450433]
We introduce CLUE-Mark, the first provably undetectable watermarking scheme for diffusion models.<n> CLUE-Mark requires no changes to the model being watermarked, is computationally efficient, and is guaranteed to have no impact on model output quality.<n>Uniquely, CLUE-Mark cannot be detected nor removed by recent steganographic attacks.
arXiv Detail & Related papers (2024-11-18T10:03:01Z) - De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively.<n>Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z) - Black-Box Detection of Language Model Watermarks [1.9374282535132377]
We develop rigorous statistical tests to detect, and estimate parameters, of all three popular watermarking scheme families.<n>We experimentally confirm the effectiveness of our methods on a range of schemes and a diverse set of open-source models.<n>Our findings indicate that current watermarking schemes are more detectable than previously believed.
arXiv Detail & Related papers (2024-05-28T08:41:30Z) - Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection [66.26348985345776]
We propose a novel watermarking method for large language models (LLMs) based on knowledge injection.
In the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge.
In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM.
Experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
arXiv Detail & Related papers (2023-11-16T03:22:53Z) - Publicly-Detectable Watermarking for Language Models [45.32236917886154]
We present a publicly-detectable watermarking scheme for LMs.<n>We embed a cryptographic signature into LM output using rejection sampling.<n>We prove that this produces unforgeable and distortion-free text output.
arXiv Detail & Related papers (2023-10-27T21:08:51Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - An Unforgeable Publicly Verifiable Watermark for Large Language Models [84.2805275589553]
Current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection.
We propose an unforgeable publicly verifiable watermark algorithm named UPV that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages.
arXiv Detail & Related papers (2023-07-30T13:43:27Z) - Undetectable Watermarks for Language Models [1.347733333991357]
We introduce a cryptographically-inspired notion of undetectable watermarks for language models.
watermarks can be detected only with the knowledge of a secret key.
We construct undetectable watermarks based on the existence of one-way functions.
arXiv Detail & Related papers (2023-05-25T02:57:16Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.