Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
- URL: http://arxiv.org/abs/2512.04668v2
- Date: Mon, 08 Dec 2025 04:55:50 GMT
- Title: Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs
- Authors: Jinbo Liu, Defu Cao, Yifei Wei, Tianyao Su, Yuan Liang, Yushun Dong, Yue Zhao, Xiyang Hu,
- Abstract summary: MAMA (Multi-Agent Memory Attack) is a framework that measures how network structure shapes leakage.<n>We quantify leakage as the fraction of ground-truth PII recovered from attacking agent outputs.<n>Results provide the first systematic mapping from architectural choices to measurable privacy risk.
- Score: 26.288357188171265
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a framework that measures how network structure shapes leakage. MAMA operates on synthetic documents containing labeled Personally Identifiable Information (PII) entities, from which we generate sanitized task instructions. We execute a two-phase protocol: Engram (seeding private information into a target agent's memory) and Resonance (multi-round interaction where an attacker attempts extraction). Over up to 10 interaction rounds, we quantify leakage as the fraction of ground-truth PII recovered from attacking agent outputs via exact matching. We systematically evaluate six common network topologies (fully connected, ring, chain, binary tree, star, and star-ring), varying agent counts $n\in\{4,5,6\}$, attacker-target placements, and base models. Our findings reveal consistent patterns: fully connected graphs exhibit maximum leakage while chains provide strongest protection; shorter attacker-target graph distance and higher target centrality significantly increase vulnerability; leakage rises sharply in early rounds before plateauing; model choice shifts absolute leakage rates but preserves topology rankings; temporal/locational PII attributes leak more readily than identity credentials or regulated identifiers. These results provide the first systematic mapping from architectural choices to measurable privacy risk, yielding actionable guidance: prefer sparse or hierarchical connectivity, maximize attacker-target separation, limit node degree and network radius, avoid shortcuts bypassing hubs, and implement topology-aware access controls.
Related papers
- Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding [28.34876433005121]
We introduce HMAGAT (Hypergraph Multi-Agent Attention Network), a novel architecture that leverages attentional mechanisms over directed hypergraphs to explicitly capture group dynamics.<n>We show how hypergraph representations the attention dilution inherent in GNNs and capture complex interactions where pairwise methods fail.<n>Our results illustrate that appropriate inductive biases are often more critical than the training data size or sheer parameter count for multi-agent problems.
arXiv Detail & Related papers (2026-02-06T14:28:54Z) - Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection [76.91230292971115]
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
arXiv Detail & Related papers (2025-12-21T13:46:36Z) - The Trojan Knowledge: Bypassing Commercial LLM Guardrails via Harmless Prompt Weaving and Adaptive Tree Search [58.8834056209347]
Large language models (LLMs) remain vulnerable to jailbreak attacks that bypass safety guardrails to elicit harmful outputs.<n>We introduce the Correlated Knowledge Attack Agent (CKA-Agent), a dynamic framework that reframes jailbreaking as an adaptive, tree-structured exploration of the target model's knowledge base.
arXiv Detail & Related papers (2025-12-01T07:05:23Z) - Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction [58.51530390018909]
Large Language Model based multi-agent systems excel at collaborative problem solving but remain brittle to cascading errors.<n>We present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction.
arXiv Detail & Related papers (2025-10-16T05:35:37Z) - BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks [58.959622170433725]
BlindGuard is an unsupervised defense method that learns without requiring any attack-specific labels or prior knowledge of malicious behaviors.<n>We show that BlindGuard effectively detects diverse attack types (i.e., prompt injection, memory poisoning, and tool attack) across multi-agent systems.
arXiv Detail & Related papers (2025-08-11T16:04:47Z) - LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks [7.115093658017371]
LeakSealer is a model-agnostic framework that combines static analysis for forensic insights with dynamic defenses in a Human-In-The-Loop pipeline.<n>We empirically evaluate LeakSealer under two scenarios: (1) jailbreak attempts, employing a public benchmark dataset, and (2) PII leakage, supported by a curated dataset of labeled LLM interactions.
arXiv Detail & Related papers (2025-08-01T13:04:28Z) - SentinelAgent: Graph-based Anomaly Detection in Multi-Agent Systems [11.497269773189254]
We present a system-level anomaly detection framework tailored for large language model (LLM)-based multi-agent systems (MAS)<n>We propose a graph-based framework that models agent interactions as dynamic execution graphs, enabling semantic anomaly detection at node, edge, and path levels.<n>Second, we introduce a pluggable SentinelAgent, an LLM-powered oversight agent that observes, analyzes, and intervenes in MAS execution based on security policies and contextual reasoning.
arXiv Detail & Related papers (2025-05-30T04:25:19Z) - Cluster-Aware Attacks on Graph Watermarks [50.19105800063768]
We introduce a cluster-aware threat model in which adversaries apply community-guided modifications to evade detection.<n>Our results show that cluster-aware attacks can reduce attribution accuracy by up to 80% more than random baselines.<n>We propose a lightweight embedding enhancement that distributes watermark nodes across graph communities.
arXiv Detail & Related papers (2025-04-24T22:49:28Z) - Dissecting Adversarial Robustness of Multimodal LM Agents [70.2077308846307]
We manually create 200 targeted adversarial tasks and evaluation scripts in a realistic threat model on top of VisualWebArena.<n>We find that we can successfully break latest agents that use black-box frontier LMs, including those that perform reflection and tree search.<n>We also use ARE to rigorously evaluate how the robustness changes as new components are added.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - Unveiling the potential of Graph Neural Networks for robust Intrusion
Detection [2.21481607673149]
We propose a novel Graph Neural Network (GNN) model to learn flow patterns of attacks structured as graphs.
Our model is able to maintain the same level of accuracy as in previous experiments, while state-of-the-art ML techniques degrade up to 50% their accuracy (F1-score) under adversarial attacks.
arXiv Detail & Related papers (2021-07-30T16:56:39Z) - Graph Backdoor [53.70971502299977]
We present GTA, the first backdoor attack on graph neural networks (GNNs)
GTA departs in significant ways: it defines triggers as specific subgraphs, including both topological structures and descriptive features.
It can be instantiated for both transductive (e.g., node classification) and inductive (e.g., graph classification) tasks.
arXiv Detail & Related papers (2020-06-21T19:45:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.