Detecting Silent Failures in Multi-Agentic AI Trajectories
- URL: http://arxiv.org/abs/2511.04032v1
- Date: Thu, 06 Nov 2025 04:00:54 GMT
- Title: Detecting Silent Failures in Multi-Agentic AI Trajectories
- Authors: Divya Pathak, Harshit Kumar, Anuska Roy, Felix George, Mudit Verma, Pratibha Moogi,
- Abstract summary: Multi-Agentic AI systems are prone to silent failures such as drift, cycles, and missing details in outputs.<n>We introduce the task of anomaly detection in agentic trajectories to identify these failures and present a dataset curation pipeline.<n>This work provides the first systematic study of anomaly detection in Multi-Agentic AI systems, offering datasets, benchmarks, and insights to guide future research.
- Score: 7.001329254828447
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-Agentic AI systems, powered by large language models (LLMs), are inherently non-deterministic and prone to silent failures such as drift, cycles, and missing details in outputs, which are difficult to detect. We introduce the task of anomaly detection in agentic trajectories to identify these failures and present a dataset curation pipeline that captures user behavior, agent non-determinism, and LLM variation. Using this pipeline, we curate and label two benchmark datasets comprising \textbf{4,275 and 894} trajectories from Multi-Agentic AI systems. Benchmarking anomaly detection methods on these datasets, we show that supervised (XGBoost) and semi-supervised (SVDD) approaches perform comparably, achieving accuracies up to 98% and 96%, respectively. This work provides the first systematic study of anomaly detection in Multi-Agentic AI systems, offering datasets, benchmarks, and insights to guide future research.
Related papers
- Universal Transformation of One-Class Classifiers for Unsupervised Anomaly Detection [51.73001988341294]
Anomaly detection is typically formulated as a one-class classification problem.<n>We present a dataset folding method that transforms an arbitrary one-class classifier-based anomaly detector into a fully unsupervised method.
arXiv Detail & Related papers (2026-02-13T16:54:12Z) - Reasoning-based Anomaly Detection Framework: A Real-time, Scalable, and Automated Approach to Anomaly Detection Across Domains [3.804483269194178]
Reasoning based Anomaly Detection Framework (RADF) is designed to perform real time anomaly detection on very large datasets.<n>RADF surpasses state-of-the-art anomaly detection models in AUC performance for 5 out of 9 public benchmarking datasets.
arXiv Detail & Related papers (2025-10-03T20:06:31Z) - PATH: A Discrete-sequence Dataset for Evaluating Online Unsupervised Anomaly Detection Approaches for Multivariate Time Series [0.01874930567916036]
Benchmarking anomaly detection approaches for multivariate time series is a challenging task due to a lack of high-quality datasets.<n>We propose a solution: a diverse, extensive, and non-trivial dataset generated via state-of-the-art simulation tools.<n>Our dataset represents a discrete-sequence problem, which remains unaddressed by previously-proposed solutions in literature.
arXiv Detail & Related papers (2024-11-21T09:03:12Z) - Anomaly Detection of Tabular Data Using LLMs [54.470648484612866]
We show that pre-trained large language models (LLMs) are zero-shot batch-level anomaly detectors.
We propose an end-to-end fine-tuning strategy to bring out the potential of LLMs in detecting real anomalies.
arXiv Detail & Related papers (2024-06-24T04:17:03Z) - Self-supervised Feature Adaptation for 3D Industrial Anomaly Detection [59.41026558455904]
We focus on multi-modal anomaly detection. Specifically, we investigate early multi-modal approaches that attempted to utilize models pre-trained on large-scale visual datasets.
We propose a Local-to-global Self-supervised Feature Adaptation (LSFA) method to finetune the adaptors and learn task-oriented representation toward anomaly detection.
arXiv Detail & Related papers (2024-01-06T07:30:41Z) - Progressing from Anomaly Detection to Automated Log Labeling and
Pioneering Root Cause Analysis [53.24804865821692]
This study introduces a taxonomy for log anomalies and explores automated data labeling to mitigate labeling challenges.
The study envisions a future where root cause analysis follows anomaly detection, unraveling the underlying triggers of anomalies.
arXiv Detail & Related papers (2023-12-22T15:04:20Z) - CL-Flow:Strengthening the Normalizing Flows by Contrastive Learning for
Better Anomaly Detection [1.951082473090397]
We propose a self-supervised anomaly detection approach that combines contrastive learning with 2D-Flow.
Compared to mainstream unsupervised approaches, our self-supervised method demonstrates superior detection accuracy, fewer additional model parameters, and faster inference speed.
Our approach showcases new state-of-the-art results, achieving a performance of 99.6% in image-level AUROC on the MVTecAD dataset and 96.8% in image-level AUROC on the BTAD dataset.
arXiv Detail & Related papers (2023-11-12T10:07:03Z) - Robust Multimodal Failure Detection for Microservice Systems [32.25907616511765]
AnoFusion is an unsupervised failure detection approach for microservice systems.
It learns the correlation of the heterogeneous multimodal data and integrates a Graph Attention Network (GAT) and Gated Recurrent Unit (GRU)
It achieves the F1-score of 0.857 and 0.922, respectively, outperforming state-of-the-art failure detection approaches.
arXiv Detail & Related papers (2023-05-30T12:39:42Z) - Deep Anomaly Detection and Search via Reinforcement Learning [22.005663849044772]
We propose Deep Anomaly Detection and Search (DADS) to balance exploitation and exploration.
During the training process, DADS searches for possible anomalies with hierarchically-structured datasets.
Results show that DADS can efficiently and precisely search anomalies from unlabeled data and learn from them.
arXiv Detail & Related papers (2022-08-31T13:03:33Z) - DAE : Discriminatory Auto-Encoder for multivariate time-series anomaly
detection in air transportation [68.8204255655161]
We propose a novel anomaly detection model called Discriminatory Auto-Encoder (DAE)
It uses the baseline of a regular LSTM-based auto-encoder but with several decoders, each getting data of a specific flight phase.
Results show that the DAE achieves better results in both accuracy and speed of detection.
arXiv Detail & Related papers (2021-09-08T14:07:55Z) - Contextual-Bandit Anomaly Detection for IoT Data in Distributed
Hierarchical Edge Computing [65.78881372074983]
IoT devices can hardly afford complex deep neural networks (DNN) models, and offloading anomaly detection tasks to the cloud incurs long delay.
We propose and build a demo for an adaptive anomaly detection approach for distributed hierarchical edge computing (HEC) systems.
We show that our proposed approach significantly reduces detection delay without sacrificing accuracy, as compared to offloading detection tasks to the cloud.
arXiv Detail & Related papers (2020-04-15T06:13:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.