CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems
- URL: http://arxiv.org/abs/2505.19405v1
- Date: Mon, 26 May 2025 01:42:37 GMT
- Title: CoTGuard: Using Chain-of-Thought Triggering for Copyright Protection in Multi-Agent LLM Systems
- Authors: Yan Wen, Junfeng Guo, Heng Huang,
- Abstract summary: We introduce CoTGuard, a novel framework for copyright protection that leverages trigger-based detection within Chain-of-Thought reasoning.<n>Specifically, we can activate specific CoT segments and monitor intermediate reasoning steps for unauthorized content reproduction by embedding specific trigger queries into agent prompts.<n>This approach enables fine-grained, interpretable detection of copyright violations in collaborative agent scenarios.
- Score: 55.57181090183713
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As large language models (LLMs) evolve into autonomous agents capable of collaborative reasoning and task execution, multi-agent LLM systems have emerged as a powerful paradigm for solving complex problems. However, these systems pose new challenges for copyright protection, particularly when sensitive or copyrighted content is inadvertently recalled through inter-agent communication and reasoning. Existing protection techniques primarily focus on detecting content in final outputs, overlooking the richer, more revealing reasoning processes within the agents themselves. In this paper, we introduce CoTGuard, a novel framework for copyright protection that leverages trigger-based detection within Chain-of-Thought (CoT) reasoning. Specifically, we can activate specific CoT segments and monitor intermediate reasoning steps for unauthorized content reproduction by embedding specific trigger queries into agent prompts. This approach enables fine-grained, interpretable detection of copyright violations in collaborative agent scenarios. We evaluate CoTGuard on various benchmarks in extensive experiments and show that it effectively uncovers content leakage with minimal interference to task performance. Our findings suggest that reasoning-level monitoring offers a promising direction for safeguarding intellectual property in LLM-based agent systems.
Related papers
- AgentSight: System-Level Observability for AI Agents Using eBPF [10.37440633887049]
Existing tools observe either an agent's high-level intent (via LLM prompts) or its low-level actions (e.g., system calls) but cannot correlate these two views.<n>We introduce AgentSight, an AgentOps observability framework that bridges this semantic gap using a hybrid approach.<n>AgentSight intercepts TLS-encrypted LLM traffic to extract semantic intent, monitors kernel events to observe system-wide effects, and causally correlates these two streams across process boundaries.
arXiv Detail & Related papers (2025-08-02T01:43:39Z) - SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems [11.497269773189254]
We present a system-level anomaly detection framework tailored for large language model (LLM)-based multi-agent systems (MAS)<n>We propose a graph-based framework that models agent interactions as dynamic execution graphs, enabling semantic anomaly detection at node, edge, and path levels.<n>Second, we introduce a pluggable SentinelAgent, an LLM-powered oversight agent that observes, analyzes, and intervenes in MAS execution based on security policies and contextual reasoning.
arXiv Detail & Related papers (2025-05-30T04:25:19Z) - LLM Agents Should Employ Security Principles [60.03651084139836]
This paper argues that the well-established design principles in information security should be employed when deploying Large Language Model (LLM) agents at scale.<n>We introduce AgentSandbox, a conceptual framework embedding these security principles to provide safeguards throughout an agent's life-cycle.
arXiv Detail & Related papers (2025-05-29T21:39:08Z) - AGENTFUZZER: Generic Black-Box Fuzzing for Indirect Prompt Injection against LLM Agents [54.29555239363013]
We propose a generic black-box fuzzing framework, AgentXploit, to automatically discover and exploit indirect prompt injection vulnerabilities.<n>We evaluate AgentXploit on two public benchmarks, AgentDojo and VWA-adv, where it achieves 71% and 70% success rates against agents based on o3-mini and GPT-4o.<n>We apply our attacks in real-world environments, successfully misleading agents to navigate to arbitrary URLs, including malicious sites.
arXiv Detail & Related papers (2025-05-09T07:40:17Z) - Automating Prompt Leakage Attacks on Large Language Models Using Agentic Approach [9.483655213280738]
This paper presents a novel approach to evaluating the security of large language models (LLMs)<n>We define prompt leakage as a critical threat to secure LLM deployment.<n>We implement a multi-agent system where cooperative agents are tasked with probing and exploiting the target LLM to elicit its prompt.
arXiv Detail & Related papers (2025-02-18T08:17:32Z) - Towards Copyright Protection for Knowledge Bases of Retrieval-augmented Language Models via Reasoning [58.57194301645823]
Large language models (LLMs) are increasingly integrated into real-world personalized applications.<n>The valuable and often proprietary nature of the knowledge bases used in RAG introduces the risk of unauthorized usage by adversaries.<n>Existing methods that can be generalized as watermarking techniques to protect these knowledge bases typically involve poisoning or backdoor attacks.<n>We propose name for harmless' copyright protection of knowledge bases.
arXiv Detail & Related papers (2025-02-10T09:15:56Z) - Towards Action Hijacking of Large Language Model-based Agent [39.19067800226033]
We introduce Name, a novel hijacking attack to manipulate the action plans of black-box agent system.<n>Our approach achieved an average bypass rate of 92.7% for safety filters.
arXiv Detail & Related papers (2024-12-14T12:11:26Z) - Audit-LLM: Multi-Agent Collaboration for Log-based Insider Threat Detection [16.154903877808795]
Audit-LLM is a multi-agent log-based insider threat detection framework comprising three collaborative agents.
We propose a pair-wise Evidence-based Multi-agent Debate (EMAD) mechanism, where two independent Executors iteratively refine their conclusions through reasoning exchange to reach a consensus.
arXiv Detail & Related papers (2024-08-12T11:33:45Z) - Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence.
This paper uncovers a significant backdoor security threat within this process.
By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z) - Watch Out for Your Agents! Investigating Backdoor Threats to LLM-Based Agents [47.219047422240145]
We take the first step to investigate one of the typical safety threats, backdoor attack, to LLM-based agents.
Specifically, compared with traditional backdoor attacks on LLMs that are only able to manipulate the user inputs and model outputs, agent backdoor attacks exhibit more diverse and covert forms.
arXiv Detail & Related papers (2024-02-17T06:48:45Z) - Silent Guardian: Protecting Text from Malicious Exploitation by Large Language Models [63.91178922306669]
We introduce Silent Guardian, a text protection mechanism against large language models (LLMs)
By carefully modifying the text to be protected, TPE can induce LLMs to first sample the end token, thus directly terminating the interaction.
We show that SG can effectively protect the target text under various configurations and achieve almost 100% protection success rate in some cases.
arXiv Detail & Related papers (2023-12-15T10:30:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.