Related papers: CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models

CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models

URL: http://arxiv.org/abs/2502.14529v1
Date: Thu, 20 Feb 2025 13:02:00 GMT
Title: CORBA: Contagious Recursive Blocking Attacks on Multi-Agent Systems Based on Large Language Models
Authors: Zhenhong Zhou, Zherui Li, Jie Zhang, Yuanhe Zhang, Kun Wang, Yang Liu, Qing Guo,
Abstract summary: Large Language Model-based Multi-Agent Systems (LLM-MASs) have demonstrated remarkable real-world capabilities.<n>This paper introduces Contagious Recursive Attacks (Corba), a novel and simple yet highly effective attack that disrupts interactions between agents.
Score: 11.70281170228352
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Large Language Model-based Multi-Agent Systems (LLM-MASs) have demonstrated remarkable real-world capabilities, effectively collaborating to complete complex tasks. While these systems are designed with safety mechanisms, such as rejecting harmful instructions through alignment, their security remains largely unexplored. This gap leaves LLM-MASs vulnerable to targeted disruptions. In this paper, we introduce Contagious Recursive Blocking Attacks (Corba), a novel and simple yet highly effective attack that disrupts interactions between agents within an LLM-MAS. Corba leverages two key properties: its contagious nature allows it to propagate across arbitrary network topologies, while its recursive property enables sustained depletion of computational resources. Notably, these blocking attacks often involve seemingly benign instructions, making them particularly challenging to mitigate using conventional alignment methods. We evaluate Corba on two widely-used LLM-MASs, namely, AutoGen and Camel across various topologies and commercial models. Additionally, we conduct more extensive experiments in open-ended interactive LLM-MASs, demonstrating the effectiveness of Corba in complex topology structures and open-source models. Our code is available at: https://github.com/zhrli324/Corba.

Related papers

From Spark to Fire: Modeling and Mitigating Error Cascades in LLM-Based Multi-Agent Collaboration [27.233204826914243]
Collaborative mechanisms may cause minor inaccuracies to solidify into system-level false consensus through iteration.<n>Existing protections often rely on single-agent validation or require modifications to the collaboration architecture.<n>We propose a propagation dynamics model tailored for LLM-MAS that abstracts collaboration as a directed dependency graph.
arXiv Detail & Related papers (2026-03-04T11:45:27Z)
Analysis of LLMs Against Prompt Injection and Jailbreak Attacks [7.685814179879813]
This work evaluates prompt-injection and jailbreak vulnerability using a large, manually curated dataset.<n>We observe significant behavioural variation across models, including refusal responses and complete silent non-responsiveness triggered by internal safety mechanisms.
arXiv Detail & Related papers (2026-02-24T12:32:11Z)
Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs [65.6660735371212]
We present textbftextscJustAsk, a framework that autonomously discovers effective extraction strategies through interaction alone.<n>It formulates extraction as an online exploration problem, using Upper Confidence Bound--based strategy selection and a hierarchical skill space spanning atomic probes and high-level orchestration.<n>Our results expose system prompts as a critical yet largely unprotected attack surface in modern agent systems.
arXiv Detail & Related papers (2026-01-29T03:53:25Z)
Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems [14.555944084540435]
LLM-based multi-agent systems (MASs) have reshaped the digital landscape with their emergent coordination and problem-solving capabilities.<n>We propose TOMA, a topology-aware multi-hop attack scheme targeting MASs.
arXiv Detail & Related papers (2025-12-03T05:10:39Z)
The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z)
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z)
Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS [12.649568006596956]
Large language model-based multi-agent systems (LLM-MAS) effectively accomplish complex and dynamic tasks through inter-agent communication.<n>Existing attack methods targeting LLM-MAS either compromise agent internals or rely on direct and overt persuasion.<n>We propose MAST, a Multi-round Adaptive Stealthy Tampering framework designed to exploit communication vulnerabilities within the system.
arXiv Detail & Related papers (2025-08-05T06:14:53Z)
Topological Structure Learning Should Be A Research Priority for LLM-Based Multi-Agent Systems [69.95482609893236]
Large Language Model-based Multi-Agent Systems (MASs) have emerged as a powerful paradigm for tackling complex tasks through collaborative intelligence.<n>We call for a paradigm shift toward emphtopology-aware MASs that explicitly model and dynamically optimize the structure of inter-agent interactions.
arXiv Detail & Related papers (2025-05-28T15:20:09Z)
A Trustworthy Multi-LLM Network: Challenges,Solutions, and A Use Case [59.58213261128626]
We propose a blockchain-enabled collaborative framework that connects multiple Large Language Models (LLMs) into a Trustworthy Multi-LLM Network (MultiLLMN)<n>This architecture enables the cooperative evaluation and selection of the most reliable and high-quality responses to complex network optimization problems.
arXiv Detail & Related papers (2025-05-06T05:32:46Z)
Red-Teaming LLM Multi-Agent Systems via Communication Attacks [10.872328358364776]
Large Language Model-based Multi-Agent Systems (LLM-MAS) have revolutionized complex problem-solving capability by enabling sophisticated agent collaboration through message-based communications.<n>We introduce Agent-in-the-Middle (AiTM), a novel attack that exploits the fundamental communication mechanisms in LLM-MAS by intercepting and manipulating inter-agent messages.
arXiv Detail & Related papers (2025-02-20T18:55:39Z)
Reasoning-Augmented Conversation for Multi-Turn Jailbreak Attacks on Large Language Models [53.580928907886324]
Reasoning-Augmented Conversation is a novel multi-turn jailbreak framework.<n>It reformulates harmful queries into benign reasoning tasks.<n>We show that RACE achieves state-of-the-art attack effectiveness in complex conversational scenarios.
arXiv Detail & Related papers (2025-02-16T09:27:44Z)
Commercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks [88.84977282952602]
A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs)<n>In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents.<n>We conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities.
arXiv Detail & Related papers (2025-02-12T17:19:36Z)
Derail Yourself: Multi-turn LLM Jailbreak Attack through Self-discovered Clues [88.96201324719205]
This study exposes the safety vulnerabilities of Large Language Models (LLMs) in multi-turn interactions. We introduce ActorAttack, a novel multi-turn attack method inspired by actor-network theory.
arXiv Detail & Related papers (2024-10-14T16:41:49Z)
Compromising Embodied Agents with Contextual Backdoor Attacks [69.71630408822767]
Large language models (LLMs) have transformed the development of embodied intelligence. This paper uncovers a significant backdoor security threat within this process. By poisoning just a few contextual demonstrations, attackers can covertly compromise the contextual environment of a black-box LLM.
arXiv Detail & Related papers (2024-08-06T01:20:12Z)
Cross-modality Information Check for Detecting Jailbreaking in Multimodal Large Language Models [17.663550432103534]
Multimodal Large Language Models (MLLMs) extend the capacity of LLMs to understand multimodal information comprehensively. These models are susceptible to jailbreak attacks, where malicious users can break the safety alignment of the target model and generate misleading and harmful answers. We propose Cross-modality Information DEtectoR (CIDER), a plug-and-play jailbreaking detector designed to identify maliciously perturbed image inputs.
arXiv Detail & Related papers (2024-07-31T15:02:46Z)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs) [17.670925982912312]
Red-teaming is a technique for identifying vulnerabilities in large language models (LLM) This paper presents a detailed threat model and provides a systematization of knowledge (SoK) of red-teaming attacks on LLMs.
arXiv Detail & Related papers (2024-07-20T17:05:04Z)
Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming [37.32997502058661]
This paper introduces the textbfsentinel model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few tokens. The sentinel model naturally overcomes the textit parameter inefficiency and textitlimited model accessibility for fine-tuning large target models. Our experiments across text-to-text and text-to-image demonstrate the effectiveness of our approach in mitigating toxic outputs.
arXiv Detail & Related papers (2024-05-21T08:57:44Z)
Prompt Leakage effect and defense strategies for multi-turn LLM interactions [95.33778028192593]
Leakage of system prompts may compromise intellectual property and act as adversarial reconnaissance for an attacker. We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting. We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts.
arXiv Detail & Related papers (2024-04-24T23:39:58Z)
Attack Prompt Generation for Red Teaming and Defending Large Language Models [70.157691818224]
Large language models (LLMs) are susceptible to red teaming attacks, which can induce LLMs to generate harmful content. We propose an integrated approach that combines manual and automatic methods to economically generate high-quality attack prompts.
arXiv Detail & Related papers (2023-10-19T06:15:05Z)
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks [5.860289498416911]
Large Language Models (LLMs) are swiftly advancing in architecture and capability. As they integrate more deeply into complex systems, the urgency to scrutinize their security properties grows. This paper surveys research in the emerging interdisciplinary field of adversarial attacks on LLMs.
arXiv Detail & Related papers (2023-10-16T21:37:24Z)
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection [64.67495502772866]
Large Language Models (LLMs) are increasingly being integrated into various applications. We show how attackers can override original instructions and employed controls using Prompt Injection attacks. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts and vulnerabilities.
arXiv Detail & Related papers (2023-02-23T17:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.