A Survey of Text Watermarking in the Era of Large Language Models
- URL: http://arxiv.org/abs/2312.07913v5
- Date: Thu, 1 Aug 2024 14:35:52 GMT
- Title: A Survey of Text Watermarking in the Era of Large Language Models
- Authors: Aiwei Liu, Leyi Pan, Yijian Lu, Jingjing Li, Xuming Hu, Xi Zhang, Lijie Wen, Irwin King, Hui Xiong, Philip S. Yu,
- Abstract summary: Text watermarking algorithms are crucial for protecting the copyright of textual content.
Recent advancements in large language models (LLMs) have revolutionized these techniques.
This paper conducts a comprehensive survey of the current state of text watermarking technology.
- Score: 91.36874607025909
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text watermarking algorithms are crucial for protecting the copyright of textual content. Historically, their capabilities and application scenarios were limited. However, recent advancements in large language models (LLMs) have revolutionized these techniques. LLMs not only enhance text watermarking algorithms with their advanced abilities but also create a need for employing these algorithms to protect their own copyrights or prevent potential misuse. This paper conducts a comprehensive survey of the current state of text watermarking technology, covering four main aspects: (1) an overview and comparison of different text watermarking techniques; (2) evaluation methods for text watermarking algorithms, including their detectability, impact on text or LLM quality, robustness under target or untargeted attacks; (3) potential application scenarios for text watermarking technology; (4) current challenges and future directions for text watermarking. This survey aims to provide researchers with a thorough understanding of text watermarking technology in the era of LLM, thereby promoting its further advancement.
Related papers
- Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? [62.72729485995075]
We investigate the effectiveness of watermarking as a deterrent against the generation of copyrighted texts.
We find that watermarking adversely affects the success rate of Membership Inference Attacks (MIAs)
We propose an adaptive technique to improve the success rate of a recent MIA under watermarking.
arXiv Detail & Related papers (2024-07-24T16:53:09Z) - On Evaluating The Performance of Watermarked Machine-Generated Texts Under Adversarial Attacks [20.972194348901958]
We first comb the mainstream watermarking schemes and removal attacks on machine-generated texts.
We evaluate eight watermarks (five pre-text, three post-text) and twelve attacks (two pre-text, ten post-text) across 87 scenarios.
Results indicate that KGW and Exponential watermarks offer high text quality and watermark retention but remain vulnerable to most attacks.
arXiv Detail & Related papers (2024-07-05T18:09:06Z) - From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models [6.2153353110363305]
This paper presents a unified overview of different perspectives behind designing watermarking techniques.
We analyze research based on the specific intentions behind different watermarking techniques.
We highlight the gaps and open challenges in text watermarking to promote research in protecting text authorship.
arXiv Detail & Related papers (2024-06-17T00:09:31Z) - Topic-Based Watermarks for LLM-Generated Text [46.71493672772134]
This paper proposes a novel topic-based watermarking algorithm for large language models (LLMs)
By using topic-specific token biases, we embed a topic-sensitive watermarking into the generated text.
We demonstrate that our proposed watermarking scheme classifies various watermarked text topics with 99.99% confidence.
arXiv Detail & Related papers (2024-04-02T17:49:40Z) - Necessary and Sufficient Watermark for Large Language Models [31.933103173481964]
We propose the Necessary and Sufficient Watermark (NS-Watermark) for inserting watermarks into generated texts without degrading text quality.
We demonstrate that the NS-Watermark can generate more natural texts than existing watermarking methods.
Especially in machine translation tasks, the NS-Watermark can outperform the existing watermarking method by up to 30 BLEU scores.
arXiv Detail & Related papers (2023-10-02T00:48:51Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - Watermarking Conditional Text Generation for AI Detection: Unveiling
Challenges and a Semantic-Aware Watermark Remedy [52.765898203824975]
We introduce a semantic-aware watermarking algorithm that considers the characteristics of conditional text generation and the input context.
Experimental results demonstrate that our proposed method yields substantial improvements across various text generation models.
arXiv Detail & Related papers (2023-07-25T20:24:22Z) - On the Reliability of Watermarks for Large Language Models [95.87476978352659]
We study the robustness of watermarked text after it is re-written by humans, paraphrased by a non-watermarked LLM, or mixed into a longer hand-written document.
We find that watermarks remain detectable even after human and machine paraphrasing.
We also consider a range of new detection schemes that are sensitive to short spans of watermarked text embedded inside a large document.
arXiv Detail & Related papers (2023-06-07T17:58:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.