Related papers: Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems

Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems

URL: http://arxiv.org/abs/2602.08104v1
Date: Sun, 08 Feb 2026 19:55:26 GMT
Title: Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems
Authors: Risal Shahriar Shefin, Debashis Gupta, Thai Le, Sarra Alqahtani,
Abstract summary: Multi-Agent Reinforcement Learning (MARL) is increasingly deployed in safety-critical domains.<n>We introduce a two-stage gradient-based framework that provides interpretable diagnostics for three critical failure analysis tasks.
Score: 8.723131512052703
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-Agent Reinforcement Learning (MARL) is increasingly deployed in safety-critical domains, yet methods for interpretable failure detection and attribution remain underdeveloped. We introduce a two-stage gradient-based framework that provides interpretable diagnostics for three critical failure analysis tasks: (1) detecting the true initial failure source (Patient-0); (2) validating why non-attacked agents may be flagged first due to domino effects; and (3) tracing how failures propagate through learned coordination pathways. Stage 1 performs interpretable per-agent failure detection via Taylor-remainder analysis of policy-gradient costs, declaring an initial Patient-0 candidate at the first threshold crossing. Stage 2 provides validation through geometric analysis of critic derivatives-first-order sensitivity and directional second-order curvature aggregated over causal windows to construct interpretable contagion graphs. This approach explains "downstream-first" detection anomalies by revealing pathways that amplify upstream deviations. Evaluated across 500 episodes in Simple Spread (3 and 5 agents) and 100 episodes in StarCraft II using MADDPG and HATRPO, our method achieves 88.2-99.4% Patient-0 detection accuracy while providing interpretable geometric evidence for detection decisions. By moving beyond black-box detection to interpretable gradient-level forensics, this framework offers practical tools for diagnosing cascading failures in safety-critical MARL systems.

Related papers

Guideline-Grounded Evidence Accumulation for High-Stakes Agent Verification [60.18369393468405]
Existing verifiers usually underperform owing to a lack of domain knowledge and limited calibration.<n>GLEAN compiles expert-curated protocols into trajectory-informed, well-calibrated correctness signals.<n>We empirically validate GLEAN with agentic clinical diagnosis across three diseases from the MIMIC-IV dataset.
arXiv Detail & Related papers (2026-03-03T09:36:43Z)
Where Did It Go Wrong? Attributing Undesirable LLM Behaviors via Representation Gradient Tracing [12.835224376066769]
Large Language Models (LLMs) have demonstrated remarkable capabilities, yet their deployment is frequently undermined by undesirable behaviors.<n>We introduce a novel and efficient framework that diagnoses a range of undesirable LLM behaviors by analyzing representation and its gradients.<n>We systematically evaluate our method for tasks that include tracking harmful content, detecting backdoor poisoning, and identifying knowledge contamination.
arXiv Detail & Related papers (2025-09-26T12:07:47Z)
VulAgent: Hypothesis-Validation based Multi-Agent Vulnerability Detection [55.957275374847484]
VulAgent is a multi-agent vulnerability detection framework based on hypothesis validation.<n>It implements a semantics-sensitive, multi-view detection pipeline, each aligned to a specific analysis perspective.<n>On average, VulAgent improves overall accuracy by 6.6%, increases the correct identification rate of vulnerable--fixed code pairs by up to 450%, and reduces the false positive rate by about 36%.
arXiv Detail & Related papers (2025-09-15T02:25:38Z)
A layered architecture for log analysis in complex IT systems [0.21756081703276]
This dissertation introduces a three-layered architecture to support DevOps in failure resolution.<n>The first layer, Log Investigation, performs autonomous log labeling and anomaly classification.<n>The second layer, Anomaly Detection, detects behaviors deviating from the norm.<n>The third layer, Root Cause Analysis, identifies minimal log sets describing failures, their origin, and event sequences.
arXiv Detail & Related papers (2025-08-29T11:28:21Z)
UGPL: Uncertainty-Guided Progressive Learning for Evidence-Based Classification in Computed Tomography [0.0]
Current approaches typically process images uniformly, limiting their ability to detect localized abnormalities.<n>We introduce UGPL, an uncertainty-guided progressive learning framework that performs a global-to-local analysis.<n> Experiments across three CT datasets demonstrate that UGPL consistently outperforms state-of-the-art methods.
arXiv Detail & Related papers (2025-07-18T17:30:56Z)
AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs [0.0]
gait impairment plays an important role in early diagnosis, disease monitoring, and treatment evaluation for neurodegenerative diseases.<n>Recent deep learning-based approaches have consistently improved classification accuracies, but they often lack interpretability.<n>We introduce AGIR, a novel pipeline consisting of a pre-trained VQ-VAE motion tokenizer and a Large Language Model (LLM) fine-tuned over pairs of motion tokens.
arXiv Detail & Related papers (2025-03-23T17:12:16Z)
Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting.<n>Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines.<n> Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z)
U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord [7.811634659561162]
T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy.<n>Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets.<n>We propose an Uncertainty-based Unsupervised Anomaly Detection framework, termed U2AD, to address these limitations.
arXiv Detail & Related papers (2025-03-17T17:33:32Z)
AnomalyAID: Reliable Interpretation for Semi-supervised Network Anomaly Detection [8.776201861433133]
AnomalyAID aims to make the anomaly detection process interpretable and improve the reliability of interpretation results.<n>We propose a novel interpretation approach that leverages global and local interpreters to provide reliable explanations.<n>We design a new two-stage semi-supervised learning framework for network anomaly detection by aligning both stages' model predictions with special constraints.
arXiv Detail & Related papers (2024-11-18T05:39:00Z)
Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs) We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries. Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z)
Mitigating the Mutual Error Amplification for Semi-Supervised Object Detection [92.52505195585925]
We propose a Cross Teaching (CT) method, aiming to mitigate the mutual error amplification by introducing a rectification mechanism of pseudo labels. In contrast to existing mutual teaching methods that directly treat predictions from other detectors as pseudo labels, we propose the Label Rectification Module (LRM)
arXiv Detail & Related papers (2022-01-26T03:34:57Z)
Unsupervised deep learning techniques for powdery mildew recognition based on multispectral imaging [63.62764375279861]
This paper presents a deep learning approach to automatically recognize powdery mildew on cucumber leaves. We focus on unsupervised deep learning techniques applied to multispectral imaging data. We propose the use of autoencoder architectures to investigate two strategies for disease detection.
arXiv Detail & Related papers (2021-12-20T13:29:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.