Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge
- URL: http://arxiv.org/abs/2503.04036v2
- Date: Tue, 11 Mar 2025 06:10:02 GMT
- Title: Robust Data Watermarking in Language Models by Injecting Fictitious Knowledge
- Authors: Xinyue Cui, Johnny Tian-Zheng Wei, Swabha Swayamdipta, Robin Jia,
- Abstract summary: Data watermarking in language models injects traceable signals, such as token sequences or stylistic patterns, into copyrighted text.<n>Previous data watermarking techniques primarily focus on effective memorization after pretraining.<n>We propose a novel data watermarking approach that injects coherent and plausible yet fictitious knowledge into training data.
- Score: 31.766208360156906
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Data watermarking in language models injects traceable signals, such as specific token sequences or stylistic patterns, into copyrighted text, allowing copyright holders to track and verify training data ownership. Previous data watermarking techniques primarily focus on effective memorization after pretraining, while overlooking challenges that arise in other stages of the LLM pipeline, such as the risk of watermark filtering during data preprocessing, or potential forgetting through post-training, or verification difficulties due to API-only access. We propose a novel data watermarking approach that injects coherent and plausible yet fictitious knowledge into training data using generated passages describing a fictitious entity and its associated attributes. Our watermarks are designed to be memorized by the LLM through seamlessly integrating in its training data, making them harder to detect lexically during preprocessing. We demonstrate that our watermarks can be effectively memorized by LLMs, and that increasing our watermarks' density, length, and diversity of attributes strengthens their memorization. We further show that our watermarks remain robust throughout LLM development, maintaining their effectiveness after continual pretraining and supervised finetuning. Finally, we show that our data watermarks can be evaluated even under API-only access via question answering.
Related papers
- Information-Guided Identification of Training Data Imprint in (Proprietary) Large Language Models [52.439289085318634]
We show how to identify training data known to proprietary large language models (LLMs) by using information-guided probes.
Our work builds on a key observation: text passages with high surprisal are good search material for memorization probes.
arXiv Detail & Related papers (2025-03-15T10:19:15Z) - Watermarking Large Language Models and the Generated Content: Opportunities and Challenges [18.01886375229288]
generative large language models (LLMs) have raised concerns about intellectual property rights violations and the spread of machine-generated misinformation.
Watermarking serves as a promising approch to establish ownership, prevent unauthorized use, and trace the origins of LLM-generated content.
This paper summarizes and shares the challenges and opportunities we found when watermarking LLMs.
arXiv Detail & Related papers (2024-10-24T18:55:33Z) - Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? [62.72729485995075]
We investigate the effectiveness of watermarking as a deterrent against the generation of copyrighted texts.
We find that watermarking adversely affects the success rate of Membership Inference Attacks (MIAs)
We propose an adaptive technique to improve the success rate of a recent MIA under watermarking.
arXiv Detail & Related papers (2024-07-24T16:53:09Z) - Waterfall: Framework for Robust and Scalable Text Watermarking and Provenance for LLMs [36.068335914828396]
We propose Waterfall, the first training-free framework for robust and scalable text watermarking.
Waterfall achieves significantly better scalability, robust verifiability, and computational efficiency compared to SOTA article-text watermarking methods.
arXiv Detail & Related papers (2024-07-05T10:51:33Z) - Learning to Watermark LLM-generated Text via Reinforcement Learning [16.61005372279407]
We study how to watermark LLM outputs to track misuse.
We design a model-level watermark that embeds signals into the output.
We propose a co-training framework based on reinforcement learning.
arXiv Detail & Related papers (2024-03-13T03:43:39Z) - Proving membership in LLM pretraining data via data watermarks [20.57538940552033]
This work proposes using data watermarks to enable principled detection with only black-box model access.
We study two watermarks: one that inserts random sequences, and another that randomly substitutes characters with Unicode lookalikes.
We show that we can robustly detect hashes from BLOOM-176B's training data, as long as they occurred at least 90 times.
arXiv Detail & Related papers (2024-02-16T18:49:27Z) - WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens.
We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z) - Turning Your Strength into Watermark: Watermarking Large Language Model via Knowledge Injection [66.26348985345776]
We propose a novel watermarking method for large language models (LLMs) based on knowledge injection.
In the watermark embedding stage, we first embed the watermarks into the selected knowledge to obtain the watermarked knowledge.
In the watermark extraction stage, questions related to the watermarked knowledge are designed, for querying the suspect LLM.
Experiments show that the watermark extraction success rate is close to 100% and demonstrate the effectiveness, fidelity, stealthiness, and robustness of our proposed method.
arXiv Detail & Related papers (2023-11-16T03:22:53Z) - Towards Codable Watermarking for Injecting Multi-bits Information to LLMs [86.86436777626959]
Large language models (LLMs) generate texts with increasing fluency and realism.
Existing watermarking methods are encoding-inefficient and cannot flexibly meet the diverse information encoding needs.
We propose Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry multi-bit customizable information.
arXiv Detail & Related papers (2023-07-29T14:11:15Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.