A Robust Semantics-based Watermark for Large Language Model against Paraphrasing
- URL: http://arxiv.org/abs/2311.08721v2
- Date: Mon, 1 Apr 2024 17:44:19 GMT
- Title: A Robust Semantics-based Watermark for Large Language Model against Paraphrasing
- Authors: Jie Ren, Han Xu, Yiding Liu, Yingqian Cui, Shuaiqiang Wang, Dawei Yin, Jiliang Tang,
- Abstract summary: Large language models (LLMs) have show great ability in various natural language tasks.
There are concerns that LLMs are possible to be used improperly or even illegally.
We propose a semantics-based watermark framework SemaMark.
- Score: 50.84892876636013
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) have show great ability in various natural language tasks. However, there are concerns that LLMs are possible to be used improperly or even illegally. To prevent the malicious usage of LLMs, detecting LLM-generated text becomes crucial in the deployment of LLM applications. Watermarking is an effective strategy to detect the LLM-generated content by encoding a pre-defined secret watermark to facilitate the detection process. However, the majority of existing watermark methods leverage the simple hashes of precedent tokens to partition vocabulary. Such watermark can be easily eliminated by paraphrase and correspondingly the detection effectiveness will be greatly compromised. Thus, to enhance the robustness against paraphrase, we propose a semantics-based watermark framework SemaMark. It leverages the semantics as an alternative to simple hashes of tokens since the paraphrase will likely preserve the semantic meaning of the sentences. Comprehensive experiments are conducted to demonstrate the effectiveness and robustness of SemaMark under different paraphrases.
Related papers
- Less is More: Sparse Watermarking in LLMs with Enhanced Text Quality [27.592486717044455]
We present a novel type of watermark, Sparse Watermark, which aims to mitigate this trade-off by applying watermarks to a small subset of generated tokens distributed across the text.
Our experimental results demonstrate that the proposed watermarking scheme achieves high detectability while generating text that outperforms previous watermarking methods in quality across various tasks.
arXiv Detail & Related papers (2024-07-17T18:52:12Z) - Waterfall: Framework for Robust and Scalable Text Watermarking and Provenance for LLMs [36.068335914828396]
We propose Waterfall, the first training-free framework for robust and scalable text watermarking.
Waterfall achieves significantly better scalability, robust verifiability, and computational efficiency compared to SOTA article-text watermarking methods.
arXiv Detail & Related papers (2024-07-05T10:51:33Z) - WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens.
We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z) - A Semantic Invariant Robust Watermark for Large Language Models [27.522264953691746]
Prior watermark algorithms face a trade-off between attack robustness and security robustness.
This is because the watermark logits for a token are determined by a certain number of preceding tokens.
We propose a semantic invariant watermarking method for LLMs that provides both attack robustness and security robustness.
arXiv Detail & Related papers (2023-10-10T06:49:43Z) - SemStamp: A Semantic Watermark with Paraphrastic Robustness for Text Generation [72.10931780019297]
Existing watermarking algorithms are vulnerable to paraphrase attacks because of their token-level design.
We propose SemStamp, a robust sentence-level semantic watermarking algorithm based on locality-sensitive hashing (LSH)
Experimental results show that our novel semantic watermark algorithm is not only more robust than the previous state-of-the-art method on both common and bigram paraphrase attacks, but also is better at preserving the quality of generation.
arXiv Detail & Related papers (2023-10-06T03:33:42Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z) - Tracing Text Provenance via Context-Aware Lexical Substitution [81.49359106648735]
We propose a natural language watermarking scheme based on context-aware lexical substitution.
Under both objective and subjective metrics, our watermarking scheme can well preserve the semantic integrity of original sentences.
arXiv Detail & Related papers (2021-12-15T04:27:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.