Related papers: Can an Individual Manipulate the Collective Decisions of Multi-Agents?

Can an Individual Manipulate the Collective Decisions of Multi-Agents?

URL: http://arxiv.org/abs/2509.16494v2
Date: Wed, 15 Oct 2025 07:53:58 GMT
Title: Can an Individual Manipulate the Collective Decisions of Multi-Agents?
Authors: Fengyuan Liu, Rui Zhao, Shuo Chen, Guohao Li, Philip Torr, Lei Han, Jindong Gu,
Abstract summary: M-Spoiler is a framework that simulates agent interactions within a multi-agent system to generate adversarial samples.<n>M-Spoiler introduces a stubborn agent that actively aids in optimizing adversarial samples.<n>Our findings confirm the risks posed by the knowledge of an individual agent in multi-agent systems.
Score: 53.01767232004823
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Individual Large Language Models (LLMs) have demonstrated significant capabilities across various domains, such as healthcare and law. Recent studies also show that coordinated multi-agent systems exhibit enhanced decision-making and reasoning abilities through collaboration. However, due to the vulnerabilities of individual LLMs and the difficulty of accessing all agents in a multi-agent system, a key question arises: If attackers only know one agent, could they still generate adversarial samples capable of misleading the collective decision? To explore this question, we formulate it as a game with incomplete information, where attackers know only one target agent and lack knowledge of the other agents in the system. With this formulation, we propose M-Spoiler, a framework that simulates agent interactions within a multi-agent system to generate adversarial samples. These samples are then used to manipulate the target agent in the target system, misleading the system's collaborative decision-making process. More specifically, M-Spoiler introduces a stubborn agent that actively aids in optimizing adversarial samples by simulating potential stubborn responses from agents in the target system. This enhances the effectiveness of the generated adversarial samples in misleading the system. Through extensive experiments across various tasks, our findings confirm the risks posed by the knowledge of an individual agent in multi-agent systems and demonstrate the effectiveness of our framework. We also explore several defense mechanisms, showing that our proposed attack framework remains more potent than baselines, underscoring the need for further research into defensive strategies.

Related papers

OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage [59.3826294523924]
We investigate the security vulnerabilities of a popular multi-agent pattern known as the orchestrator setup.<n>We report the susceptibility of frontier models to different categories of attacks, finding that both reasoning and non-reasoning models are vulnerable.
arXiv Detail & Related papers (2026-02-13T21:32:32Z)
AgentArk: Distilling Multi-Agent Intelligence into a Single LLM Agent [57.10083973844841]
AgentArk is a novel framework to distill multi-agent dynamics into the weights of a single model.<n>We investigate three hierarchical distillation strategies across various models, tasks, scaling, and scenarios.<n>By shifting the burden of computation from inference to training, the distilled models preserve the efficiency of one agent while exhibiting strong reasoning and self-correction performance of multiple agents.
arXiv Detail & Related papers (2026-02-03T19:18:28Z)
Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection [76.91230292971115]
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
arXiv Detail & Related papers (2025-12-21T13:46:36Z)
Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems [25.286964510949183]
A core security property is robustness, stating that the system should maintain its integrity under adversarial attacks.<n>We propose a new defense approach, Cowpox, to provably enhance the robustness of multi-agent systems.
arXiv Detail & Related papers (2025-08-12T07:48:51Z)
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z)
Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems [25.6233463223145]
We study intention-hiding threats in multi-agent systems powered by Large Language Models (LLM-MAS)<n>We design four representative attack paradigms that subtly disrupt task completion while maintaining a high degree of stealth.<n>To counter these threats, we propose AgentXposed, a psychology-inspired detection framework.
arXiv Detail & Related papers (2025-07-07T07:34:34Z)
Demonstrations of Integrity Attacks in Multi-Agent Systems [7.640342064257848]
Multi-Agent Systems (MAS) could be vulnerable to malicious agents that exploit the system to serve self-interests without disrupting its core functionality.<n>This work explores integrity attacks where malicious agents employ subtle prompt manipulation to bias MAS operations and gain various benefits.
arXiv Detail & Related papers (2025-06-05T02:44:49Z)
PeerGuard: Defending Multi-Agent Systems Against Backdoor Attacks Through Mutual Reasoning [8.191214701984162]
Multi-agent systems leverage advanced AI models as autonomous agents that interact, cooperate, or compete to complete complex tasks.<n>Despite their growing importance, safety in multi-agent systems remains largely underexplored.<n>This work investigates backdoor vulnerabilities in multi-agent systems and proposes a defense mechanism based on agent interactions.
arXiv Detail & Related papers (2025-05-16T19:08:29Z)
Assessing Collective Reasoning in Multi-Agent LLMs via Hidden Profile Tasks [5.120446836495469]
We introduce the Hidden Profile paradigm from social psychology as a diagnostic testbed for multi-agent LLM systems.<n>By distributing critical information asymmetrically across agents, the paradigm reveals how inter-agent dynamics support or hinder collective reasoning.<n>We find that while cooperative agents are prone to over-coordination in collective settings, increased contradiction impairs group convergence.
arXiv Detail & Related papers (2025-05-15T19:22:54Z)
Preventing Rogue Agents Improves Multi-Agent Collaboration [21.955058255432974]
We propose to monitor agents during action prediction and intervene when a future error is likely to occur.<n>Experiments on WhoDunitEnv, code generation tasks and the GovSim environment for resource sustainability show that our approach leads to substantial performance gains.
arXiv Detail & Related papers (2025-02-09T18:35:08Z)
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents [58.79302663733703]
Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents.<n>The impact of clumsy or even malicious agents--those who frequently make errors in their tasks--on the overall performance of the system remains underexplored.<n>This paper investigates what is the resilience of various system structures under faulty agents on different downstream tasks.
arXiv Detail & Related papers (2024-08-02T03:25:20Z)
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety [70.84902425123406]
Multi-agent systems, when enhanced with Large Language Models (LLMs), exhibit profound capabilities in collective intelligence. However, the potential misuse of this intelligence for malicious purposes presents significant risks. We propose a framework (PsySafe) grounded in agent psychology, focusing on identifying how dark personality traits in agents can lead to risky behaviors. Our experiments reveal several intriguing phenomena, such as the collective dangerous behaviors among agents, agents' self-reflection when engaging in dangerous behavior, and the correlation between agents' psychological assessments and dangerous behaviors.
arXiv Detail & Related papers (2024-01-22T12:11:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.