Related papers: Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection

Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection

URL: http://arxiv.org/abs/2512.18733v1
Date: Sun, 21 Dec 2025 13:46:36 GMT
Title: Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection
Authors: Junjun Pan, Yixin Liu, Rui Miao, Kaize Ding, Yu Zheng, Quoc Viet Hung Nguyen, Alan Wee-Chung Liew, Shirui Pan,
Abstract summary: Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
Score: 76.91230292971115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks. As MAS become increasingly autonomous in various safety-critical tasks, detecting malicious agents has become a critical security concern. Although existing graph anomaly detection (GAD)-based defenses can identify anomalous agents, they mainly rely on coarse sentence-level information and overlook fine-grained lexical cues, leading to suboptimal performance. Moreover, the lack of interpretability in these methods limits their reliability and real-world applicability. To address these limitations, we propose XG-Guard, an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS. To incorporate both coarse and fine-grained textual information for anomalous agent identification, we utilize a bi-level agent encoder to jointly model the sentence- and token-level representations of each agent. A theme-based anomaly detector further captures the evolving discussion focus in MAS dialogues, while a bi-level score fusion mechanism quantifies token-level contributions for explanation. Extensive experiments across diverse MAS topologies and attack scenarios demonstrate robust detection performance and strong interpretability of XG-Guard.

Related papers

AgentDoG: A Diagnostic Guardrail Framework for AI Agent Safety and Security [126.49733412191416]
Current guardrail models lack agentic risk awareness and transparency in risk diagnosis.<n>We propose a unified three-dimensional taxonomy that categorizes agentic risks by their source (where), failure mode (how), and consequence (what)<n>We introduce a new fine-grained agentic safety benchmark (ATBench) and a Diagnostic Guardrail framework for agent safety and security (AgentDoG)
arXiv Detail & Related papers (2026-01-26T13:45:41Z)
The Why Behind the Action: Unveiling Internal Drivers via Agentic Attribution [63.61358761489141]
Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering.<n>We propose a novel framework for textbfgeneral agentic attribution, designed to identify the internal factors driving agent actions regardless of the task outcome.<n>We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias.
arXiv Detail & Related papers (2026-01-21T15:22:21Z)
Can an Individual Manipulate the Collective Decisions of Multi-Agents? [53.01767232004823]
M-Spoiler is a framework that simulates agent interactions within a multi-agent system to generate adversarial samples.<n>M-Spoiler introduces a stubborn agent that actively aids in optimizing adversarial samples.<n>Our findings confirm the risks posed by the knowledge of an individual agent in multi-agent systems.
arXiv Detail & Related papers (2025-09-20T01:54:20Z)
VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z)
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z)
MARBLE: A Multi-Agent Rule-Based LLM Reasoning Engine for Accident Severity Prediction [1.3102025155414727]
Accident severity prediction plays a critical role in transportation safety systems.<n>Existing methods often rely on monolithic models or black box prompting.<n>We propose a multiagent rule based LLM engine that decomposes the severity prediction task across a team of specialized reasoning agents.
arXiv Detail & Related papers (2025-07-07T11:27:49Z)
Who's the Mole? Modeling and Detecting Intention-Hiding Malicious Agents in LLM-Based Multi-Agent Systems [25.6233463223145]
We study intention-hiding threats in multi-agent systems powered by Large Language Models (LLM-MAS)<n>We design four representative attack paradigms that subtly disrupt task completion while maintaining a high degree of stealth.<n>To counter these threats, we propose AgentXposed, a psychology-inspired detection framework.
arXiv Detail & Related papers (2025-07-07T07:34:34Z)
Demonstrations of Integrity Attacks in Multi-Agent Systems [7.640342064257848]
Multi-Agent Systems (MAS) could be vulnerable to malicious agents that exploit the system to serve self-interests without disrupting its core functionality.<n>This work explores integrity attacks where malicious agents employ subtle prompt manipulation to bias MAS operations and gain various benefits.
arXiv Detail & Related papers (2025-06-05T02:44:49Z)
SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems [11.497269773189254]
We present a system-level anomaly detection framework tailored for large language model (LLM)-based multi-agent systems (MAS)<n>We propose a graph-based framework that models agent interactions as dynamic execution graphs, enabling semantic anomaly detection at node, edge, and path levels.<n>Second, we introduce a pluggable SentinelAgent, an LLM-powered oversight agent that observes, analyzes, and intervenes in MAS execution based on security policies and contextual reasoning.
arXiv Detail & Related papers (2025-05-30T04:25:19Z)
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling [5.798273384241793]
Large language models (LLMs) enable the development of intelligent agents capable of engaging in complex and multi-turn dialogues.<n>GUARDIAN is a method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs.
arXiv Detail & Related papers (2025-05-25T17:15:55Z)
G-Safeguard: A Topology-Guided Security Lens and Treatment on LLM-based Multi-agent Systems [10.450573905691677]
Large Language Model (LLM)-based Multi-agent Systems (MAS) have demonstrated remarkable capabilities in various complex tasks.<n>As these systems become increasingly integrated into critical applications, their vulnerability to adversarial attacks, misinformation propagation, and unintended behaviors have raised significant concerns.<n>We introduce G-Safeguard, a topology-guided security lens and treatment for robust MAS.
arXiv Detail & Related papers (2025-02-16T13:48:41Z)
On the Resilience of LLM-Based Multi-Agent Collaboration with Faulty Agents [58.79302663733703]
Large language model-based multi-agent systems have shown great abilities across various tasks due to the collaboration of expert agents.<n>The impact of clumsy or even malicious agents--those who frequently make errors in their tasks--on the overall performance of the system remains underexplored.<n>This paper investigates what is the resilience of various system structures under faulty agents on different downstream tasks.
arXiv Detail & Related papers (2024-08-02T03:25:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.