Robust Multi-bit Natural Language Watermarking through Invariant
Features
- URL: http://arxiv.org/abs/2305.01904v2
- Date: Fri, 9 Jun 2023 07:17:14 GMT
- Title: Robust Multi-bit Natural Language Watermarking through Invariant
Features
- Authors: KiYoon Yoo, Wonhyuk Ahn, Jiho Jang, Nojun Kwak
- Abstract summary: Original natural language contents are susceptible to illegal piracy and potential misuse.
To effectively combat piracy and protect copyrights, a multi-bit watermarking framework should be able to embed adequate bits of information.
In this work, we explore ways to advance both payload and robustness by following a well-known proposition from image watermarking.
- Score: 28.4935678626116
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent years have witnessed a proliferation of valuable original natural
language contents found in subscription-based media outlets, web novel
platforms, and outputs of large language models. However, these contents are
susceptible to illegal piracy and potential misuse without proper security
measures. This calls for a secure watermarking system to guarantee copyright
protection through leakage tracing or ownership identification. To effectively
combat piracy and protect copyrights, a multi-bit watermarking framework should
be able to embed adequate bits of information and extract the watermarks in a
robust manner despite possible corruption. In this work, we explore ways to
advance both payload and robustness by following a well-known proposition from
image watermarking and identify features in natural language that are invariant
to minor corruption. Through a systematic analysis of the possible sources of
errors, we further propose a corruption-resistant infill model. Our full method
improves upon the previous work on robustness by +16.8% point on average on
four datasets, three corruption types, and two corruption ratios. Code
available at https://github.com/bangawayoo/nlp-watermarking.
Related papers
- Certifiably Robust Image Watermark [57.546016845801134]
Generative AI raises many societal concerns such as boosting disinformation and propaganda campaigns.
Watermarking AI-generated content is a key technology to address these concerns.
We propose the first image watermarks with certified robustness guarantees against removal and forgery attacks.
arXiv Detail & Related papers (2024-07-04T17:56:04Z) - Watermarking Language Models with Error Correcting Codes [41.21656847672627]
We propose a watermarking framework that encodes statistical signals through an error correcting code.
Our method, termed robust binary code (RBC) watermark, introduces no distortion compared to the original probability distribution.
Our empirical findings suggest our watermark is fast, powerful, and robust, comparing favorably to the state-of-the-art.
arXiv Detail & Related papers (2024-06-12T05:13:09Z) - Evaluating Durability: Benchmark Insights into Multimodal Watermarking [36.12198778931536]
We study robustness of watermarked content generated by image and text generation models against common real-world image corruptions and text perturbations.
Our results could pave the way for the development of more robust watermarking techniques in the future.
arXiv Detail & Related papers (2024-06-06T03:57:08Z) - Edit Distance Robust Watermarks for Language Models [29.69428894587431]
Motivated by the problem of detecting AI-generated text, we consider the problem of watermarking the output of language models with provable guarantees.
We aim for watermarks which satisfy: (a) undetectability, a cryptographic notion introduced by Christ, Gunn & Zamir (2024) and (b) robustness to channels which introduce a constant fraction of adversarial insertions, substitutions, and deletions.
arXiv Detail & Related papers (2024-06-04T04:03:17Z) - Improving the Generation Quality of Watermarked Large Language Models
via Word Importance Scoring [81.62249424226084]
Token-level watermarking inserts watermarks in the generated texts by altering the token probability distributions.
This watermarking algorithm alters the logits during generation, which can lead to a downgraded text quality.
We propose to improve the quality of texts generated by a watermarked language model by Watermarking with Importance Scoring (WIS)
arXiv Detail & Related papers (2023-11-16T08:36:00Z) - A Resilient and Accessible Distribution-Preserving Watermark for Large Language Models [65.40460716619772]
Our research focuses on the importance of a textbfDistribution-textbfPreserving (DiP) watermark.
Contrary to the current strategies, our proposed DiPmark simultaneously preserves the original token distribution during watermarking.
It is detectable without access to the language model API and prompts (accessible), and is provably robust to moderate changes of tokens.
arXiv Detail & Related papers (2023-10-11T17:57:35Z) - Towards Robust Model Watermark via Reducing Parametric Vulnerability [57.66709830576457]
backdoor-based ownership verification becomes popular recently, in which the model owner can watermark the model.
We propose a mini-max formulation to find these watermark-removed models and recover their watermark behavior.
Our method improves the robustness of the model watermarking against parametric changes and numerous watermark-removal attacks.
arXiv Detail & Related papers (2023-09-09T12:46:08Z) - Advancing Beyond Identification: Multi-bit Watermark for Large Language Models [31.066140913513035]
We show the viability of tackling misuses of large language models beyond the identification of machine-generated text.
We propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language model generation.
arXiv Detail & Related papers (2023-08-01T01:27:40Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - Watermarking Text Generated by Black-Box Language Models [103.52541557216766]
A watermark-based method was proposed for white-box LLMs, allowing them to embed watermarks during text generation.
A detection algorithm aware of the list can identify the watermarked text.
We develop a watermarking framework for black-box language model usage scenarios.
arXiv Detail & Related papers (2023-05-14T07:37:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.