Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions
- URL: http://arxiv.org/abs/2406.02603v1
- Date: Sun, 2 Jun 2024 04:07:32 GMT
- Title: Distortion-free Watermarks are not Truly Distortion-free under Watermark Key Collisions
- Authors: Yihan Wu, Ruibo Chen, Zhengmian Hu, Yanshuo Chen, Junfeng Guo, Hongyang Zhang, Heng Huang,
- Abstract summary: Language model (LM) watermarking techniques inject a statistical signal into LM-generated content.
We introduce a new family of distortion-free watermarks--beta-watermark.
Experimental results support that the beta-watermark can effectively reduce the distribution bias under key collisions.
- Score: 58.777395817878514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language model (LM) watermarking techniques inject a statistical signal into LM-generated content by substituting the random sampling process with pseudo-random sampling, using watermark keys as the random seed. Among these statistical watermarking approaches, distortion-free watermarks are particularly crucial because they embed watermarks into LM-generated content without compromising generation quality. However, one notable limitation of pseudo-random sampling compared to true-random sampling is that, under the same watermark keys (i.e., key collision), the results of pseudo-random sampling exhibit correlations. This limitation could potentially undermine the distortion-free property. Our studies reveal that key collisions are inevitable due to the limited availability of watermark keys, and existing distortion-free watermarks exhibit a significant distribution bias toward the original LM distribution in the presence of key collisions. Moreover, achieving a perfect distortion-free watermark is impossible as no statistical signal can be embedded under key collisions. To reduce the distribution bias caused by key collisions, we introduce a new family of distortion-free watermarks--beta-watermark. Experimental results support that the beta-watermark can effectively reduce the distribution bias under key collisions.
Related papers
- Watermark Smoothing Attacks against Language Models [40.02225709485305]
We introduce smoothing attacks and show that existing watermarking methods are not robust against minor modifications of text.
Our attack reveals a fundamental limitation of a wide range of watermarking techniques.
arXiv Detail & Related papers (2024-07-19T11:04:54Z) - A Watermark for Low-entropy and Unbiased Generation in Large Language Models [6.505831742654826]
Recent advancements in large language models (LLMs) have highlighted the risk of misuse.
We propose a novel tradeoff between watermark strength and text quality in unbiased watermarks.
Experimental results on low-entropy and high-entropy datasets demonstrate that STA-1 achieves text quality and watermark strength comparable to existing unbiased watermarks.
arXiv Detail & Related papers (2024-05-23T14:17:29Z) - Gaussian Shading: Provable Performance-Lossless Image Watermarking for Diffusion Models [71.13610023354967]
Copyright protection and inappropriate content generation pose challenges for the practical implementation of diffusion models.
We propose a diffusion model watermarking technique that is both performance-lossless and training-free.
arXiv Detail & Related papers (2024-04-07T13:30:10Z) - Multi-Bit Distortion-Free Watermarking for Large Language Models [4.7381853007029475]
We extend an existing zero-bit distortion-free watermarking method by embedding multiple bits of meta-information as part of the watermark.
We also develop a computationally efficient decoder that extracts the embedded information from the watermark with low bit error rate.
arXiv Detail & Related papers (2024-02-26T14:01:34Z) - Proving membership in LLM pretraining data via data watermarks [20.57538940552033]
This work proposes using data watermarks to enable principled detection with only black-box model access.
We study two watermarks: one that inserts random sequences, and another that randomly substitutes characters with Unicode lookalikes.
We show that we can robustly detect hashes from BLOOM-176B's training data, as long as they occurred at least 90 times.
arXiv Detail & Related papers (2024-02-16T18:49:27Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - Tree-Ring Watermarks: Fingerprints for Diffusion Images that are
Invisible and Robust [55.91987293510401]
Watermarking the outputs of generative models is a crucial technique for tracing copyright and preventing potential harm from AI-generated content.
We introduce a novel technique called Tree-Ring Watermarking that robustly fingerprints diffusion model outputs.
Our watermark is semantically hidden in the image space and is far more robust than watermarking alternatives that are currently deployed.
arXiv Detail & Related papers (2023-05-31T17:00:31Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.