PromptCOS: Towards System Prompt Copyright Auditing for LLMs via Content-level Output Similarity
- URL: http://arxiv.org/abs/2509.03117v1
- Date: Wed, 03 Sep 2025 08:19:40 GMT
- Title: PromptCOS: Towards System Prompt Copyright Auditing for LLMs via Content-level Output Similarity
- Authors: Yuchen Yang, Yiming Li, Hongwei Yao, Enhao Huang, Shuo Shao, Bingrun Yang, Zhibo Wang, Dacheng Tao, Zhan Qin,
- Abstract summary: We propose PromptCOS, a method for auditing prompt copyright based on content-level output similarity.<n>It embeds watermarks by optimizing the prompt while simultaneously co-optimizing a special verification query and content-level signal marks.<n>For copyright verification, PromptCOS identifies unauthorized usage by comparing the similarity between the suspicious output and the signal mark.
- Score: 61.793486262641345
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid progress of large language models (LLMs) has greatly enhanced reasoning tasks and facilitated the development of LLM-based applications. A critical factor in improving LLM-based applications is the design of effective system prompts, which significantly impact the behavior and output quality of LLMs. However, system prompts are susceptible to theft and misuse, which could undermine the interests of prompt owners. Existing methods protect prompt copyrights through watermark injection and verification but face challenges due to their reliance on intermediate LLM outputs (e.g., logits), which limits their practical feasibility. In this paper, we propose PromptCOS, a method for auditing prompt copyright based on content-level output similarity. It embeds watermarks by optimizing the prompt while simultaneously co-optimizing a special verification query and content-level signal marks. This is achieved by leveraging cyclic output signals and injecting auxiliary tokens to ensure reliable auditing in content-only scenarios. Additionally, it incorporates cover tokens to protect the watermark from malicious deletion. For copyright verification, PromptCOS identifies unauthorized usage by comparing the similarity between the suspicious output and the signal mark. Experimental results demonstrate that our method achieves high effectiveness (99.3% average watermark similarity), strong distinctiveness (60.8% greater than the best baseline), high fidelity (accuracy degradation of no more than 0.58%), robustness (resilience against three types of potential attacks), and computational efficiency (up to 98.1% reduction in computational cost). Our code is available at GitHub https://github.com/LianPing-cyber/PromptCOS.
Related papers
- SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking [58.475471437150674]
We propose sequential watermarking for soft prompts (SWAP)<n>SWAP encodes watermarks through a specific order of defender-specified out-of-distribution classes.<n>Experiments on 11 datasets demonstrate SWAP's effectiveness, harmlessness, and robustness against potential adaptive attacks.
arXiv Detail & Related papers (2025-11-05T13:48:48Z) - Paladin: Defending LLM-enabled Phishing Emails with a New Trigger-Tag Paradigm [26.399199616508596]
Malicious users can synthesize phishing emails that are free from spelling mistakes and other easily detectable features.<n>Such models can generate topic-specific phishing messages, tailoring content to the target domain.<n>Most existing semantic-level detection approaches struggle to identify them reliably.<n>We propose Paladin, which embeds trigger-tag associations into vanilla LLM using various insertion strategies.<n>When an instrumented LLM generates content related to phishing, it will automatically include detectable tags, enabling easier identification.
arXiv Detail & Related papers (2025-09-08T23:44:00Z) - I Know What You Said: Unveiling Hardware Cache Side-Channels in Local Large Language Model Inference [19.466754645346175]
Large Language Models (LLMs) that can be deployed locally have recently gained popularity for privacy-sensitive tasks.<n>We unveil novel side-channel vulnerabilities in local LLM inference, which can expose both the victim's input and output text.<n>We design a novel eavesdropping attack framework targeting both open-source and proprietary LLM inference systems.
arXiv Detail & Related papers (2025-05-10T19:06:37Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - SimMark: A Robust Sentence-Level Similarity-Based Watermarking Algorithm for Large Language Models [4.069844339028727]
SimMark is a robust sentence-level watermarking algorithm for large language models (LLMs)<n>It embeds detectable statistical patterns imperceptible to humans, and employs a soft counting mechanism.<n>We show that SimMark sets a new benchmark for robust watermarking of LLM-generated content.
arXiv Detail & Related papers (2025-02-05T00:21:01Z) - Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding [61.45448947483328]
We introduce Lossless Acceleration via Speculative Decoding for LLM-based Recommender Systems (LASER)<n>LASER features a Customized Retrieval Pool to enhance retrieval efficiency and Relaxed Verification to improve the acceptance rate of draft tokens.<n>LASER achieves a 3-5x speedup on public datasets and saves about 67% of computational resources during the online A/B test.
arXiv Detail & Related papers (2024-08-11T02:31:13Z) - Token-Level Adversarial Prompt Detection Based on Perplexity Measures
and Contextual Information [67.78183175605761]
Large Language Models are susceptible to adversarial prompt attacks.
This vulnerability underscores a significant concern regarding the robustness and reliability of LLMs.
We introduce a novel approach to detecting adversarial prompts at a token level.
arXiv Detail & Related papers (2023-11-20T03:17:21Z) - WatME: Towards Lossless Watermarking Through Lexical Redundancy [58.61972059246715]
This study assesses the impact of watermarking on different capabilities of large language models (LLMs) from a cognitive science lens.
We introduce Watermarking with Mutual Exclusion (WatME) to seamlessly integrate watermarks.
arXiv Detail & Related papers (2023-11-16T11:58:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.