Related papers: Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis

Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis

URL: http://arxiv.org/abs/2509.13782v1
Date: Wed, 17 Sep 2025 07:50:44 GMT
Title: Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis
Authors: Yu Ge, Linna Xie, Zhong Li, Yu Pei, Tian Zhang,
Abstract summary: We propose FAMAS, the first spectrum-based failure attribution approach for MASs.<n>The core idea of FAMAS is to estimate, from variations across repeated MAS executions, the likelihood that each agent action is responsible for the failure.<n>In particular, we propose a novel suspiciousness formula tailored to MASs, which integrates two key factor groups, namely the agent behavior group and the action behavior group.
Score: 10.235089248238108
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large Language Model Powered Multi-Agent Systems (MASs) are increasingly employed to automate complex real-world problems, such as programming and scientific discovery. Despite their promising, MASs are not without their flaws. However, failure attribution in MASs - pinpointing the specific agent actions responsible for failures - remains underexplored and labor-intensive, posing significant challenges for debugging and system improvement. To bridge this gap, we propose FAMAS, the first spectrum-based failure attribution approach for MASs, which operates through systematic trajectory replay and abstraction, followed by spectrum analysis.The core idea of FAMAS is to estimate, from variations across repeated MAS executions, the likelihood that each agent action is responsible for the failure. In particular, we propose a novel suspiciousness formula tailored to MASs, which integrates two key factor groups, namely the agent behavior group and the action behavior group, to account for the agent activation patterns and the action activation patterns within the execution trajectories of MASs. Through expensive evaluations against 12 baselines on the Who and When benchmark, FAMAS demonstrates superior performance by outperforming all the methods in comparison.

Related papers

MAS-FIRE: Fault Injection and Reliability Evaluation for LLM-Based Multi-Agent Systems [38.44649280816596]
We propose MAS-FIRE, a systematic framework for fault injection and reliability evaluation of Multi-Agent Systems.<n>We define a taxonomy of 15 fault types covering intra-agent cognitive errors and inter-agent coordination failures.<n>Applying MAS-FIRE to three representative MAS architectures, we uncover a rich set of fault-tolerant behaviors.
arXiv Detail & Related papers (2026-02-23T13:47:43Z)
MAS-Orchestra: Understanding and Improving Multi-Agent Reasoning Through Holistic Orchestration and Controlled Benchmarks [86.05918381895555]
We propose MASOrchestra as a training-time framework that formulates MAS orchestration as a function-calling reinforcement learning problem.<n>In MAS-Orchestra, complex, goal-oriented subagents are abstracted as callable functions, enabling global reasoning over system structure.<n>Our analysis reveals that MAS gains depend critically on task structure, verification protocols, and the capabilities of both orchestrator and subagents.
arXiv Detail & Related papers (2026-01-21T04:57:02Z)
Explainable and Fine-Grained Safeguarding of LLM Multi-Agent Systems via Bi-Level Graph Anomaly Detection [76.91230292971115]
Large language model (LLM)-based multi-agent systems (MAS) have shown strong capabilities in solving complex tasks.<n>XG-Guard is an explainable and fine-grained safeguarding framework for detecting malicious agents in MAS.
arXiv Detail & Related papers (2025-12-21T13:46:36Z)
Metacognitive Self-Correction for Multi-Agent System via Prototype-Guided Next-Execution Reconstruction [58.51530390018909]
Large Language Model based multi-agent systems excel at collaborative problem solving but remain brittle to cascading errors.<n>We present MASC, a metacognitive framework that endows MAS with real-time, unsupervised, step-level error detection and self-correction.
arXiv Detail & Related papers (2025-10-16T05:35:37Z)
Testing and Enhancing Multi-Agent Systems for Robust Code Generation [21.38351747327572]
Multi-agent systems (MASs) have emerged as a promising paradigm for automated code generation.<n>Despite their prosperous development and adoption, their robustness remains pressingly under-explored.<n>This paper presents the first comprehensive study examining the robustness of MASs for code generation through a fuzzing-based testing approach.
arXiv Detail & Related papers (2025-10-12T05:45:04Z)
An Empirical Study on Failures in Automated Issue Solving [12.571536148821144]
We analyze the performance and efficiency of three SOTA tools, spanning both pipeline-based and agentic architectures, in automated issue solving tasks of SWE-Bench-Verified.<n>To move from high-level performance metrics to underlying cause analysis, we conducted a systematic manual analysis of 150 failed instances.<n>The results reveal distinct failure fingerprints between the two architectural paradigms, with the majority of agentic failures stemming from flawed reasoning and cognitive deadlocks.
arXiv Detail & Related papers (2025-09-17T13:07:52Z)
Automatic Failure Attribution and Critical Step Prediction Method for Multi-Agent Systems Based on Causal Inference [8.823529310904162]
Multi-agent systems (MAS) are critical for automating complex tasks, yet their practical deployment is hampered by the challenge of failure attribution.<n>We introduce the first failure attribution framework for MAS grounded in multi-granularity causal inference.
arXiv Detail & Related papers (2025-09-10T15:22:00Z)
LumiMAS: A Comprehensive Framework for Real-Time Monitoring and Enhanced Observability in Multi-Agent Systems [20.35152929165441]
The proposed framework consists of three key components: a monitoring and logging layer, anomaly detection layer, and anomaly explanation layer.<n>LumiMAS was evaluated on seven different MAS applications, implemented using two popular MAS platforms, and a diverse set of possible failures.
arXiv Detail & Related papers (2025-08-17T15:55:02Z)
MAS-ZERO: Designing Multi-Agent Systems with Zero Supervision [76.42361936804313]
We introduce MAS-ZERO, the first self-evolved, inference-time framework for automatic MAS design.<n> MAS-ZERO employs meta-level design to iteratively generate, evaluate, and refine MAS configurations tailored to each problem instance.
arXiv Detail & Related papers (2025-05-21T00:56:09Z)
Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems [50.29939179830491]
Failure attribution in LLM multi-agent systems remains underexplored and labor-intensive.<n>We develop and evaluate three automated failure attribution methods, summarizing their corresponding pros and cons.<n>The best method achieves 53.5% accuracy in identifying failure-responsible agents but only 14.2% in pinpointing failure steps.
arXiv Detail & Related papers (2025-04-30T23:09:44Z)
Why Do Multi-Agent LLM Systems Fail? [91.39266556855513]
We present MAST (Multi-Agent System Failure taxonomy), the first empirically grounded taxonomy designed to understand MAS failures.<n>We analyze seven popular MAS frameworks across over 200 tasks, involving six expert human annotators.<n>We identify 14 unique failure modes, organized into 3 overarching categories, (i) specification issues, (ii) inter-agent misalignment, and (iii) task verification.
arXiv Detail & Related papers (2025-03-17T19:04:38Z)
A Cooperative Multi-Agent Framework for Zero-Shot Named Entity Recognition [71.61103962200666]
Zero-shot named entity recognition (NER) aims to develop entity recognition systems from unannotated text corpora.<n>Recent work has adapted large language models (LLMs) for zero-shot NER by crafting specialized prompt templates.<n>We introduce the cooperative multi-agent system (CMAS), a novel framework for zero-shot NER.
arXiv Detail & Related papers (2025-02-25T23:30:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.