GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
- URL: http://arxiv.org/abs/2505.19234v1
- Date: Sun, 25 May 2025 17:15:55 GMT
- Title: GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
- Authors: Jialong Zhou, Lichao Wang, Xiao Yang,
- Abstract summary: Large language models (LLMs) enable the development of intelligent agents capable of engaging in complex and multi-turn dialogues.<n>GUARDIAN is a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs.
- Score: 2.7211182721830123
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The emergence of large language models (LLMs) enables the development of intelligent agents capable of engaging in complex and multi-turn dialogues. However, multi-agent collaboration face critical safety challenges, such as hallucination amplification and error injection and propagation. This paper presents GUARDIAN, a unified method for detecting and mitigating multiple safety concerns in GUARDing Intelligent Agent collaboratioNs. By modeling the multi-agent collaboration process as a discrete-time temporal attributed graph, GUARDIAN explicitly captures the propagation dynamics of hallucinations and errors. The unsupervised encoder-decoder architecture incorporating an incremental training paradigm, learns to reconstruct node attributes and graph structures from latent embeddings, enabling the identification of anomalous nodes and edges with unparalleled precision. Moreover, we introduce a graph abstraction mechanism based on the Information Bottleneck Theory, which compresses temporal interaction graphs while preserving essential patterns. Extensive experiments demonstrate GUARDIAN's effectiveness in safeguarding LLM multi-agent collaborations against diverse safety vulnerabilities, achieving state-of-the-art accuracy with efficient resource utilization.
Related papers
- SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems [11.497269773189254]
We present a system-level anomaly detection framework tailored for large language model (LLM)-based multi-agent systems (MAS)<n>We propose a graph-based framework that models agent interactions as dynamic execution graphs, enabling semantic anomaly detection at node, edge, and path levels.<n>Second, we introduce a pluggable SentinelAgent, an LLM-powered oversight agent that observes, analyzes, and intervenes in MAS execution based on security policies and contextual reasoning.
arXiv Detail & Related papers (2025-05-30T04:25:19Z) - Perspectives for Direct Interpretability in Multi-Agent Deep Reinforcement Learning [0.41783829807634765]
Multi-Agent Deep Reinforcement Learning (MADRL) was proven efficient in solving complex problems in robotics or games.<n>This paper advocates for direct interpretability, generating post hoc explanations directly from trained models.<n>We explore modern methods, including relevance backpropagation, knowledge edition, model steering, activation patching, sparse autoencoders and circuit discovery.
arXiv Detail & Related papers (2025-02-02T09:15:27Z) - Algorithmic Segmentation and Behavioral Profiling for Ransomware Detection Using Temporal-Correlation Graphs [0.0]
A novel framework was introduced, leveraging Temporal-Correlation Graphs to model the intricate relationships and temporal patterns inherent in malicious operations.<n>Experiments demonstrated the framework's effectiveness across diverse ransomware families, with consistently high precision, recall, and overall detection accuracy.<n>The research contributes to advancing cybersecurity technologies by integrating dynamic graph analytics and machine learning for future innovations in threat detection.
arXiv Detail & Related papers (2025-01-29T06:09:25Z) - Scaling Large Language Model-based Multi-Agent Collaboration [72.8998796426346]
Recent breakthroughs in large language model-driven autonomous agents have revealed that multi-agent collaboration often surpasses each individual through collective reasoning.<n>This study explores whether the continuous addition of collaborative agents can yield similar benefits.
arXiv Detail & Related papers (2024-06-11T11:02:04Z) - Secret Collusion among Generative AI Agents: Multi-Agent Deception via Steganography [43.468790060808914]
Recent capability increases in large language models (LLMs) open up applications in which groups of communicating generative AI agents solve joint tasks.<n>This poses privacy and security challenges concerning the unauthorised sharing of information.<n>Modern steganographic techniques could render such dynamics hard to detect.
arXiv Detail & Related papers (2024-02-12T09:31:21Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - MADiff: Offline Multi-agent Learning with Diffusion Models [79.18130544233794]
MADiff is a diffusion-based multi-agent learning framework.<n>It works as both a decentralized policy and a centralized controller.<n>Our experiments demonstrate that MADiff outperforms baseline algorithms across various multi-agent learning tasks.
arXiv Detail & Related papers (2023-05-27T02:14:09Z) - Decentralized Adversarial Training over Graphs [44.03711922549992]
The vulnerability of machine learning models to adversarial attacks has been attracting attention in recent years.<n>We develop a decentralized adversarial framework for multiagent systems.
arXiv Detail & Related papers (2023-03-23T15:05:16Z) - Soft Hierarchical Graph Recurrent Networks for Many-Agent Partially
Observable Environments [9.067091068256747]
We propose a novel network structure called hierarchical graph recurrent network(HGRN) for multi-agent cooperation under partial observability.
Based on the above technologies, we proposed a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant named SAC-HRGN.
arXiv Detail & Related papers (2021-09-05T09:51:25Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Information Obfuscation of Graph Neural Networks [96.8421624921384]
We study the problem of protecting sensitive attributes by information obfuscation when learning with graph structured data.
We propose a framework to locally filter out pre-determined sensitive attributes via adversarial training with the total variation and the Wasserstein distance.
arXiv Detail & Related papers (2020-09-28T17:55:04Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.