Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Ownership Verification with Reasoning
- URL: http://arxiv.org/abs/2502.10440v1
- Date: Mon, 10 Feb 2025 09:15:56 GMT
- Title: Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Ownership Verification with Reasoning
- Authors: Junfeng Guo, Yiming Li, Ruibo Chen, Yihan Wu, Chenxi Liu, Yanshuo Chen, Heng Huang,
- Abstract summary: Large language models (LLMs) are increasingly integrated into real-world applications through retrieval-augmented generation (RAG) mechanisms.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
- Score: 58.57194301645823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly integrated into real-world applications through retrieval-augmented generation (RAG) mechanisms to supplement their responses with up-to-date and domain-specific knowledge. However, the valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries. Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning attacks. However, these methods require to alter the results of verification samples (\eg, generating incorrect outputs), inevitably making them susceptible to anomaly detection and even introduce new security risks. To address these challenges, we propose \name{} for `harmless' copyright protection of knowledge bases. Instead of manipulating LLM's final output, \name{} implants distinct verification behaviors in the space of chain-of-thought (CoT) reasoning, maintaining the correctness of the final answer. Our method has three main stages: (1) \textbf{Generating CoTs}: For each verification question, we generate two CoTs, including a target CoT for building watermark behaviors; (2) \textbf{Optimizing Watermark Phrases and Target CoTs}: We optimize them to minimize retrieval errors under the black-box setting of suspicious LLM, ensuring that the watermarked verification queries activate the target CoTs without being activated in non-watermarked ones; (3) \textbf{Ownership Verification}: We exploit a pairwise Wilcoxon test to statistically verify whether a suspicious LLM is augmented with the protected knowledge base by comparing its responses to watermarked and benign verification queries. Our experiments on diverse benchmarks demonstrate that \name{} effectively protects knowledge bases against unauthorized usage while preserving the integrity and performance of the RAG.
Related papers
- Reason2Attack: Jailbreaking Text-to-Image Models via LLM Reasoning [34.73320827764541]
Text-to-Image(T2I) models typically deploy safety filters to prevent the generation of sensitive images.
Recent jailbreaking attack methods manually design prompts for the LLM to generate adversarial prompts.
We propose Reason2Attack(R2A), which aims to enhance the LLM's reasoning capabilities in generating adversarial prompts.
arXiv Detail & Related papers (2025-03-23T08:40:39Z) - Dataset Protection via Watermarked Canaries in Retrieval-Augmented LLMs [67.0310240737424]
We introduce a novel approach to safeguard the ownership of text datasets and effectively detect unauthorized use by the RA-LLMs.<n>Our approach preserves the original data completely unchanged while protecting it by inserting specifically designed canary documents into the IP dataset.<n>During the detection process, unauthorized usage is identified by querying the canary documents and analyzing the responses of RA-LLMs.
arXiv Detail & Related papers (2025-02-15T04:56:45Z) - RAG-WM: An Efficient Black-Box Watermarking Approach for Retrieval-Augmented Generation of Large Language Models [24.88433543377822]
We propose a novel black-box "knowledge watermark" approach, named RAG-WM, to detect IP infringement of RAGs.<n>RAG-WM uses a multi-LLM interaction framework to create watermark texts based on watermark entity-relationships and inject them into the target RAG.<n> Experimental results show that RAG-WM effectively detects the stolen RAGs in various deployed LLMs.
arXiv Detail & Related papers (2025-01-09T14:01:15Z) - TrustRAG: Enhancing Robustness and Trustworthiness in RAG [31.231916859341865]
TrustRAG is a framework that systematically filters compromised and irrelevant contents before they are retrieved for generation.<n>TrustRAG delivers substantial improvements in retrieval accuracy, efficiency, and attack resistance compared to existing approaches.
arXiv Detail & Related papers (2025-01-01T15:57:34Z) - Bileve: Securing Text Provenance in Large Language Models Against Spoofing with Bi-level Signature [39.973130114073605]
We introduce a bi-level signature scheme, Bileve, which embeds fine-grained signature bits for integrity checks.
Bileve can differentiate 5 scenarios during detection, reliably tracing text and regulating LLMs.
arXiv Detail & Related papers (2024-06-04T03:58:14Z) - DIP-Watermark: A Double Identity Protection Method Based on Robust Adversarial Watermark [13.007649270429493]
Face Recognition (FR) systems pose privacy risks.
One countermeasure is adversarial attack, deceiving unauthorized malicious FR.
We propose the first double identity protection scheme based on traceable adversarial watermarking.
arXiv Detail & Related papers (2024-04-23T02:50:38Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens.
We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z) - Certifying LLM Safety against Adversarial Prompting [70.96868018621167]
Large language models (LLMs) are vulnerable to adversarial attacks that add malicious tokens to an input prompt.<n>We introduce erase-and-check, the first framework for defending against adversarial prompts with certifiable safety guarantees.
arXiv Detail & Related papers (2023-09-06T04:37:20Z) - ADC: Adversarial attacks against object Detection that evade Context
consistency checks [55.8459119462263]
We show that even context consistency checks can be brittle to properly crafted adversarial examples.
We propose an adaptive framework to generate examples that subvert such defenses.
Our results suggest that how to robustly model context and check its consistency, is still an open problem.
arXiv Detail & Related papers (2021-10-24T00:25:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.