Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs
- URL: http://arxiv.org/abs/2502.10673v1
- Date: Sat, 15 Feb 2025 04:56:45 GMT
- Title: Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs
- Authors: Yepeng Liu, Xuandong Zhao, Dawn Song, Yuheng Bu,
- Abstract summary: We introduce a novel approach to safeguard the ownership of text datasets and effectively detect unauthorized use by the RA-LLMs.
Our approach preserves the original data completely unchanged while protecting it by inserting specifically designed canary documents into the IP dataset.
During the detection process, unauthorized usage is identified by querying the canary documents and analyzing the responses of RA-LLMs.
- Score: 67.0310240737424
- License:
- Abstract: Retrieval-Augmented Generation (RAG) has become an effective method for enhancing large language models (LLMs) with up-to-date knowledge. However, it poses a significant risk of IP infringement, as IP datasets may be incorporated into the knowledge database by malicious Retrieval-Augmented LLMs (RA-LLMs) without authorization. To protect the rights of the dataset owner, an effective dataset membership inference algorithm for RA-LLMs is needed. In this work, we introduce a novel approach to safeguard the ownership of text datasets and effectively detect unauthorized use by the RA-LLMs. Our approach preserves the original data completely unchanged while protecting it by inserting specifically designed canary documents into the IP dataset. These canary documents are created with synthetic content and embedded watermarks to ensure uniqueness, stealthiness, and statistical provability. During the detection process, unauthorized usage is identified by querying the canary documents and analyzing the responses of RA-LLMs for statistical evidence of the embedded watermark. Our experimental results demonstrate high query efficiency, detectability, and stealthiness, along with minimal perturbation to the original dataset, all without compromising the performance of the RAG system.
Related papers
- RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models [24.88433543377822]
We propose a novel black-box "knowledge watermark" approach, named RAG-WM, to detect IP infringement of RAGs.
RAG-WM uses a multi-LLM interaction framework to create watermark texts based on watermark entity-relationships and inject them into the target RAG.
Experimental results show that RAG-WM effectively detects the stolen RAGs in various deployed LLMs.
arXiv Detail & Related papers (2025-01-09T14:01:15Z) - Data Watermarking for Sequential Recommender Systems [52.207721219147814]
We study the problem of data watermarking for sequential recommender systems.
dataset watermarking protects the ownership of the entire dataset, and user watermarking safeguards the data of individual users.
Our approach involves randomly selecting unpopular items to create a watermark sequence, which is then inserted into normal users' interaction sequences.
arXiv Detail & Related papers (2024-11-20T02:34:21Z) - Detecting Dataset Abuse in Fine-Tuning Stable Diffusion Models for Text-to-Image Synthesis [3.8809673918404246]
dataset watermarking framework designed to detect unauthorized usage and trace data leaks.
We present a dataset watermarking framework designed to detect unauthorized usage and trace data leaks.
arXiv Detail & Related papers (2024-09-27T16:34:48Z) - Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? [62.72729485995075]
We investigate the effectiveness of watermarking as a deterrent against the generation of copyrighted texts.
We find that watermarking adversely affects the success rate of Membership Inference Attacks (MIAs)
We propose an adaptive technique to improve the success rate of a recent MIA under watermarking.
arXiv Detail & Related papers (2024-07-24T16:53:09Z) - TabularMark: Watermarking Tabular Datasets for Machine Learning [20.978995194849297]
We propose a hypothesis testing-based watermarking scheme, TabularMark.
Data noise partitioning is utilized for data perturbation during embedding.
Experiments on real-world and synthetic datasets demonstrate the superiority of TabularMark in detectability, non-intrusiveness, and robustness.
arXiv Detail & Related papers (2024-06-21T02:58:45Z) - DREW : Towards Robust Data Provenance by Leveraging Error-Controlled Watermarking [58.37644304554906]
We propose Data Retrieval with Error-corrected codes and Watermarking (DREW)
DREW randomly clusters the reference dataset and injects unique error-controlled watermark keys into each cluster.
After locating the relevant cluster, embedding vector similarity retrieval is performed within the cluster to find the most accurate matches.
arXiv Detail & Related papers (2024-06-05T01:19:44Z) - Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation [0.9217021281095907]
We introduce an efficient and easy-to-use method for conducting a Membership Inference Attack (MIA) against RAG systems.
We demonstrate the effectiveness of our attack using two benchmark datasets and multiple generative models.
Our findings highlight the importance of implementing security countermeasures in deployed RAG systems.
arXiv Detail & Related papers (2024-05-30T19:46:36Z) - RAEDiff: Denoising Diffusion Probabilistic Models Based Reversible
Adversarial Examples Self-Generation and Self-Recovery [1.9806850896246193]
Reversible Adversarial Exsamples (RAE) can help to solve the issues of IP protection for datasets.
RAEDiff is introduced for generating RAEs based on a Denoising Diffusion Probabilistic Models (DDPM)
arXiv Detail & Related papers (2023-10-25T01:49:29Z) - Domain Watermark: Effective and Harmless Dataset Copyright Protection is
Closed at Hand [96.26251471253823]
backdoor-based dataset ownership verification (DOV) is currently the only feasible approach to protect the copyright of open-source datasets.
We make watermarked models (trained on the protected dataset) correctly classify some hard' samples that will be misclassified by the benign model.
arXiv Detail & Related papers (2023-10-09T11:23:05Z) - Did You Train on My Dataset? Towards Public Dataset Protection with
Clean-Label Backdoor Watermarking [54.40184736491652]
We propose a backdoor-based watermarking approach that serves as a general framework for safeguarding public-available data.
By inserting a small number of watermarking samples into the dataset, our approach enables the learning model to implicitly learn a secret function set by defenders.
This hidden function can then be used as a watermark to track down third-party models that use the dataset illegally.
arXiv Detail & Related papers (2023-03-20T21:54:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.