Related papers: Unbiased Watermark for Large Language Models

Unbiased Watermark for Large Language Models

URL: http://arxiv.org/abs/2310.10669v2
Date: Wed, 18 Oct 2023 02:02:08 GMT
Title: Unbiased Watermark for Large Language Models
Authors: Zhengmian Hu, Lichang Chen, Xidong Wu, Yihan Wu, Hongyang Zhang, Heng Huang,
Abstract summary: This study examines how significantly watermarks impact the quality of model-generated outputs. It is possible to integrate watermarks without affecting the output probability distribution. The presence of watermarks does not compromise the performance of the model in downstream tasks.
Score: 67.43415395591221
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: The recent advancements in large language models (LLMs) have sparked a growing apprehension regarding the potential misuse. One approach to mitigating this risk is to incorporate watermarking techniques into LLMs, allowing for the tracking and attribution of model outputs. This study examines a crucial aspect of watermarking: how significantly watermarks impact the quality of model-generated outputs. Previous studies have suggested a trade-off between watermark strength and output quality. However, our research demonstrates that it is possible to integrate watermarks without affecting the output probability distribution with appropriate implementation. We refer to this type of watermark as an unbiased watermark. This has significant implications for the use of LLMs, as it becomes impossible for users to discern whether a service provider has incorporated watermarks or not. Furthermore, the presence of watermarks does not compromise the performance of the model in downstream tasks, ensuring that the overall utility of the language model is preserved. Our findings contribute to the ongoing discussion around responsible AI development, suggesting that unbiased watermarks can serve as an effective means of tracking and attributing model outputs without sacrificing output quality.

Related papers

Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation? [75.99961894619986]
This paper investigates whether student models can acquire the capabilities of teacher models through knowledge distillation while avoiding watermark inheritance. We propose two categories of watermark removal approaches: pre-distillation removal through untargeted and targeted training data paraphrasing (UP and TP), and post-distillation removal through inference-time watermark neutralization (WN)
arXiv Detail & Related papers (2025-02-17T09:34:19Z)
Improved Unbiased Watermark for Large Language Models [59.00698153097887]
We introduce MCmark, a family of unbiased, Multi-Channel-based watermarks. MCmark preserves the original distribution of the language model. It offers significant improvements in detectability and robustness over existing unbiased watermarks.
arXiv Detail & Related papers (2025-02-16T21:02:36Z)
CLUE-MARK: Watermarking Diffusion Models using CLWE [13.010337595004708]
We introduce CLUE-Mark, the first provably undetectable watermarking scheme for diffusion models. CLUE-Mark requires no changes to the model being watermarked, is computationally efficient, and is guaranteed to have no impact on model output quality. Uniquely, CLUE-Mark cannot be detected nor removed by recent steganographic attacks.
arXiv Detail & Related papers (2024-11-18T10:03:01Z)
ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization [15.570148419846175]
Existing watermarking methods face the challenge of balancing robustness and concealment. This paper introduces a watermark hiding process to actively achieve concealment, thus allowing the embedding of stronger watermarks. Experiments on various diffusion models demonstrate the watermark remains verifiable even under significant image tampering.
arXiv Detail & Related papers (2024-11-06T12:14:23Z)
De-mark: Watermark Removal in Large Language Models [59.00698153097887]
We present De-mark, an advanced framework designed to remove n-gram-based watermarks effectively. Our method utilizes a novel querying strategy, termed random selection probing, which aids in assessing the strength of the watermark.
arXiv Detail & Related papers (2024-10-17T17:42:10Z)
A Watermark for Low-entropy and Unbiased Generation in Large Language Models [6.505831742654826]
This study proposes the Sampling One Then Accepting (STA-1) method, a watermark that can address all of these issues. We show that in low-entropy scenarios, unbiased watermarks face a tradeoff between watermark strength and the risk of unsatisfactory outputs. Experimental results on both low-entropy and high-entropy datasets demonstrate that STA-1 achieves text quality and watermark strength comparable to existing unbiased watermarks.
arXiv Detail & Related papers (2024-05-23T14:17:29Z)
Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models [71.13610023354967]
Copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models. We propose a diffusion model watermarking technique that is both performance-lossless and training-free.
arXiv Detail & Related papers (2024-04-07T13:30:10Z)
Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection [66.26348985345776]
We propose a novel watermarking method for large language models (LLMs) based on knowledge injection. In the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge. In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM. Experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
arXiv Detail & Related papers (2023-11-16T03:22:53Z)
ClearMark: Intuitive and Robust Model Watermarking via Transposed Model Training [50.77001916246691]
This paper introduces ClearMark, the first DNN watermarking method designed for intuitive human assessment. ClearMark embeds visible watermarks, enabling human decision-making without rigid value thresholds. It shows an 8,544-bit watermark capacity comparable to the strongest existing work.
arXiv Detail & Related papers (2023-10-25T08:16:55Z)
Undetectable Watermarks for Language Models [1.347733333991357]
We introduce a cryptographically-inspired notion of undetectable watermarks for language models. watermarks can be detected only with the knowledge of a secret key. We construct undetectable watermarks based on the existence of one-way functions.
arXiv Detail & Related papers (2023-05-25T02:57:16Z)
A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models. The watermark can be embedded with negligible impact on text quality. It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.