Let Watermarks Speak: A Robust and Unforgeable Watermark for Language Models
- URL: http://arxiv.org/abs/2412.19603v1
- Date: Fri, 27 Dec 2024 11:58:05 GMT
- Title: Let Watermarks Speak: A Robust and Unforgeable Watermark for Language Models
- Authors: Minhao Bai,
- Abstract summary: We propose an undetectable, robust, single-bit watermarking scheme.
It has a comparable robustness to the most advanced zero-bit watermarking schemes.
- Score: 0.0
- License:
- Abstract: Watermarking is an effective way to trace model-generated content. Current watermark methods cannot resist forgery attacks, such as a deceptive claim that the model-generated content is a response to a fabricated prompt. None of them can be made unforgeable without degrading robustness. Unforgeability demands that the watermarked output is not only detectable but also verifiable for integrity, indicating whether it has been modified. This underscores the necessity and significance of a multi-bit watermarking scheme. Recent works try to build multi-bit scheme based on existing zero-bit watermarking scheme, but they either degrades the robustness or brings a significant computational burden. We aim to design a novel single-bit watermark scheme, which provides the ability to embed 2 different watermark signals. This paper's main contribution is that we are the first to propose an undetectable, robust, single-bit watermarking scheme. It has a comparable robustness to the most advanced zero-bit watermarking schemes. Then we construct a multi-bit watermarking scheme to use the hash value of prompt or the newest generated content as the watermark signals, and embed them into the following content, which guarantees the unforgeability. Additionally, we provide sufficient experiments on some popular language models, while the other advanced methods with provable guarantees do not often provide. The results show that our method is practically effective and robust.
Related papers
- Revisiting the Robustness of Watermarking to Paraphrasing Attacks [10.68370011459729]
Many recent watermarking techniques modify the output probabilities of LMs to embed a signal in the generated output that can later be detected.
We show that with access to only a limited number of generations from a black-box watermarked model, we can drastically increase the effectiveness of paraphrasing attacks to evade watermark detection.
arXiv Detail & Related papers (2024-11-08T02:22:30Z) - ROBIN: Robust and Invisible Watermarks for Diffusion Models with Adversarial Optimization [15.570148419846175]
Existing watermarking methods face the challenge of balancing robustness and concealment.
This paper introduces a watermark hiding process to actively achieve concealment, thus allowing the embedding of stronger watermarks.
Experiments on various diffusion models demonstrate the watermark remains verifiable even under significant image tampering.
arXiv Detail & Related papers (2024-11-06T12:14:23Z) - An undetectable watermark for generative image models [65.31658824274894]
We present the first undetectable watermarking scheme for generative image models.
In particular, an undetectable watermark does not degrade image quality under any efficiently computable metric.
Our scheme works by selecting the initial latents of a diffusion model using a pseudorandom error-correcting code.
arXiv Detail & Related papers (2024-10-09T18:33:06Z) - Certifiably Robust Image Watermark [57.546016845801134]
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns.
Watermarking AI-generated content is a key technology to address these concerns.
We propose the first image watermarks with certified robustness guarantees against removal and forgery attacks.
arXiv Detail & Related papers (2024-07-04T17:56:04Z) - Unbiased Watermark for Large Language Models [67.43415395591221]
This study examines how significantly watermarks impact the quality of model-generated outputs.
It is possible to integrate watermarks without affecting the output probability distribution.
The presence of watermarks does not compromise the performance of the model in downstream tasks.
arXiv Detail & Related papers (2023-09-22T12:46:38Z) - A Watermark for Large Language Models [84.95327142027183]
We propose a watermarking framework for proprietary language models.
The watermark can be embedded with negligible impact on text quality.
It can be detected using an efficient open-source algorithm without access to the language model API or parameters.
arXiv Detail & Related papers (2023-01-24T18:52:59Z) - Certified Neural Network Watermarks with Randomized Smoothing [64.86178395240469]
We propose a certifiable watermarking method for deep learning models.
We show that our watermark is guaranteed to be unremovable unless the model parameters are changed by more than a certain l2 threshold.
Our watermark is also empirically more robust compared to previous watermarking methods.
arXiv Detail & Related papers (2022-07-16T16:06:59Z) - Piracy-Resistant DNN Watermarking by Block-Wise Image Transformation
with Secret Key [15.483078145498085]
The proposed method embeds a watermark pattern in a model by using learnable transformed images.
It is piracy-resistant, so the original watermark cannot be overwritten by a pirated watermark.
The results show that it was resilient against fine-tuning and pruning attacks while maintaining a high watermark-detection accuracy.
arXiv Detail & Related papers (2021-04-09T08:21:53Z) - Fine-tuning Is Not Enough: A Simple yet Effective Watermark Removal
Attack for DNN Models [72.9364216776529]
We propose a novel watermark removal attack from a different perspective.
We design a simple yet powerful transformation algorithm by combining imperceptible pattern embedding and spatial-level transformations.
Our attack can bypass state-of-the-art watermarking solutions with very high success rates.
arXiv Detail & Related papers (2020-09-18T09:14:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.