SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection
- URL: http://arxiv.org/abs/2510.16219v2
- Date: Tue, 21 Oct 2025 12:58:13 GMT
- Title: SentinelNet: Safeguarding Multi-Agent Collaboration Through Credit-Based Dynamic Threat Detection
- Authors: Yang Feng, Xudong Pan,
- Abstract summary: Malicious agents pose significant threats to the reliability and decision-making capabilities of Multi-Agent Systems (MAS) powered by Large Language Models (LLMs)<n>We propose SentinelNet, the first decentralized framework for proactively detecting and mitigating malicious behaviors in multi-agent collaboration.<n> Experiments on MAS benchmarks show SentinelNet achieves near-perfect detection of malicious agents, close to 100% within two debate rounds, and recovers 95% of system accuracy from compromised baselines.
- Score: 22.242243610133215
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Malicious agents pose significant threats to the reliability and decision-making capabilities of Multi-Agent Systems (MAS) powered by Large Language Models (LLMs). Existing defenses often fall short due to reactive designs or centralized architectures which may introduce single points of failure. To address these challenges, we propose SentinelNet, the first decentralized framework for proactively detecting and mitigating malicious behaviors in multi-agent collaboration. SentinelNet equips each agent with a credit-based detector trained via contrastive learning on augmented adversarial debate trajectories, enabling autonomous evaluation of message credibility and dynamic neighbor ranking via bottom-k elimination to suppress malicious communications. To overcome the scarcity of attack data, it generates adversarial trajectories simulating diverse threats, ensuring robust training. Experiments on MAS benchmarks show SentinelNet achieves near-perfect detection of malicious agents, close to 100% within two debate rounds, and recovers 95% of system accuracy from compromised baselines. By exhibiting strong generalizability across domains and attack patterns, SentinelNet establishes a novel paradigm for safeguarding collaborative MAS.
Related papers
- OMNI-LEAK: Orchestrator Multi-Agent Network Induced Data Leakage [59.3826294523924]
We investigate the security vulnerabilities of a popular multi-agent pattern known as the orchestrator setup.<n>We report the susceptibility of frontier models to different categories of attacks, finding that both reasoning and non-reasoning models are vulnerable.
arXiv Detail & Related papers (2026-02-13T21:32:32Z) - Multi-Agent-Driven Cognitive Secure Communications in Satellite-Terrestrial Networks [58.70163955407538]
Malicious eavesdroppers pose a serious threat to private information via satellite-terrestrial networks (STNs)<n>We propose a cognitive secure communication framework driven by multiple agents that coordinates spectrum scheduling and protection through real-time sensing.<n>We exploit generative adversarial networks to produce adversarial matrices, and employ learning-aided power control to set real and adversarial signal powers for protection layer.
arXiv Detail & Related papers (2026-01-06T10:30:41Z) - Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents [1.014002853673217]
LLM agents are vulnerable to Indirect Prompt Injection (IPI) attacks.<n>IPI attacks hijack agent behavior by polluting external information sources.<n>We propose the Cognitive Control Architecture (CCA), a holistic framework achieving full-lifecycle cognitive supervision.
arXiv Detail & Related papers (2025-12-07T08:11:19Z) - BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z) - Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems [25.6233463223145]
We study intention-hiding threats in multi-agent systems powered by Large Language Models (LLM-MAS)<n>We design four representative attack paradigms that subtly disrupt task completion while maintaining a high degree of stealth.<n>To counter these threats, we propose AgentXposed, a psychology-inspired detection framework.
arXiv Detail & Related papers (2025-07-07T07:34:34Z) - Manipulating Multimodal Agents via Cross-Modal Prompt Injection [34.35145839873915]
We identify a critical yet previously overlooked security vulnerability in multimodal agents.<n>We propose CrossInject, a novel attack framework in which attackers embed adversarial perturbations across multiple modalities.<n>Our method outperforms state-of-the-art attacks, achieving at least a +30.1% increase in attack success rates.
arXiv Detail & Related papers (2025-04-19T16:28:03Z) - Guardians of the Agentic System: Preventing Many Shots Jailbreak with Agentic System [0.8136541584281987]
This work uses three examination methods to detect rogue agents through a Reverse Turing Test and analyze deceptive alignment through multi-agent simulations.<n>We develop an anti-jailbreaking system by testing it with GEMINI 1.5 pro and llama-3.3-70B, deepseek r1 models.<n>The detection capabilities are strong such as 94% accuracy for GEMINI 1.5 pro yet the system suffers persistent vulnerabilities when under long attacks.
arXiv Detail & Related papers (2025-02-23T23:35:15Z) - Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks [62.036798488144306]
Current defense mainly focuses on the known attacks, but the adversarial robustness to the unknown attacks is seriously overlooked.
We propose an attack-agnostic defense method named Meta Invariance Defense (MID)
We show that MID simultaneously achieves robustness to the imperceptible adversarial perturbations in high-level image classification and attack-suppression in low-level robust image regeneration.
arXiv Detail & Related papers (2024-04-04T10:10:38Z) - Malicious Agent Detection for Robust Multi-Agent Collaborative Perception [52.261231738242266]
Multi-agent collaborative (MAC) perception is more vulnerable to adversarial attacks than single-agent perception.
We propose Malicious Agent Detection (MADE), a reactive defense specific to MAC perception.
We conduct comprehensive evaluations on a benchmark 3D dataset V2X-sim and a real-road dataset DAIR-V2X.
arXiv Detail & Related papers (2023-10-18T11:36:42Z) - Adversarial Attack and Defense in Deep Ranking [100.17641539999055]
We propose two attacks against deep ranking systems that can raise or lower the rank of chosen candidates by adversarial perturbations.
Conversely, an anti-collapse triplet defense is proposed to improve the ranking model robustness against all proposed attacks.
Our adversarial ranking attacks and defenses are evaluated on MNIST, Fashion-MNIST, CUB200-2011, CARS196 and Stanford Online Products datasets.
arXiv Detail & Related papers (2021-06-07T13:41:45Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.