Robust Root Cause Diagnosis using In-Distribution Interventions
- URL: http://arxiv.org/abs/2505.00930v1
- Date: Fri, 02 May 2025 00:19:43 GMT
- Title: Robust Root Cause Diagnosis using In-Distribution Interventions
- Authors: Lokesh Nagalapatti, Ashutosh Srivastava, Sunita Sarawagi, Amit Sharma,
- Abstract summary: Diagnosing the root cause of an anomaly in a complex interconnected system is a pressing problem in today's cloud services and industrial operations.<n>We propose In-Distribution Interventions (IDI), a novel algorithm that predicts root cause as nodes that meet two criteria.
- Score: 31.19149413954674
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diagnosing the root cause of an anomaly in a complex interconnected system is a pressing problem in today's cloud services and industrial operations. We propose In-Distribution Interventions (IDI), a novel algorithm that predicts root cause as nodes that meet two criteria: 1) **Anomaly:** root cause nodes should take on anomalous values; 2) **Fix:** had the root cause nodes assumed usual values, the target node would not have been anomalous. Prior methods of assessing the fix condition rely on counterfactuals inferred from a Structural Causal Model (SCM) trained on historical data. But since anomalies are rare and fall outside the training distribution, the fitted SCMs yield unreliable counterfactual estimates. IDI overcomes this by relying on interventional estimates obtained by solely probing the fitted SCM at in-distribution inputs. We present a theoretical analysis comparing and bounding the errors in assessing the fix condition using interventional and counterfactual estimates. We then conduct experiments by systematically varying the SCM's complexity to demonstrate the cases where IDI's interventional approach outperforms the counterfactual approach and vice versa. Experiments on both synthetic and PetShop RCD benchmark datasets demonstrate that \our\ consistently identifies true root causes more accurately and robustly than nine existing state-of-the-art RCD baselines. Code is released at https://github.com/nlokeshiisc/IDI_release.
Related papers
- U2AD: Uncertainty-based Unsupervised Anomaly Detection Framework for Detecting T2 Hyperintensity in MRI Spinal Cord [7.811634659561162]
T2 hyperintensities in spinal cord MR images are crucial biomarkers for conditions such as degenerative cervical myelopathy.<n>Deep learning methods have shown promise in lesion detection, but most supervised approaches are heavily dependent on large, annotated datasets.<n>We propose an Uncertainty-based Unsupervised Anomaly Detection framework, termed U2AD, to address these limitations.
arXiv Detail & Related papers (2025-03-17T17:33:32Z) - DAGnosis: Localized Identification of Data Inconsistencies using
Structures [73.39285449012255]
Identification and appropriate handling of inconsistencies in data at deployment time is crucial to reliably use machine learning models.
We use directed acyclic graphs (DAGs) to encode the training set's features probability distribution and independencies as a structure.
Our method, called DAGnosis, leverages these structural interactions to bring valuable and insightful data-centric conclusions.
arXiv Detail & Related papers (2024-02-26T11:29:16Z) - Alleviating Structural Distribution Shift in Graph Anomaly Detection [70.1022676681496]
Graph anomaly detection (GAD) is a challenging binary classification problem.
Gallon neural networks (GNNs) benefit the classification of normals from aggregating homophilous neighbors.
We propose a framework to mitigate the effect of heterophilous neighbors and make them invariant.
arXiv Detail & Related papers (2024-01-25T13:07:34Z) - Understanding, Predicting and Better Resolving Q-Value Divergence in
Offline-RL [86.0987896274354]
We first identify a fundamental pattern, self-excitation, as the primary cause of Q-value estimation divergence in offline RL.
We then propose a novel Self-Excite Eigenvalue Measure (SEEM) metric to measure the evolving property of Q-network at training.
For the first time, our theory can reliably decide whether the training will diverge at an early stage.
arXiv Detail & Related papers (2023-10-06T17:57:44Z) - BALANCE: Bayesian Linear Attribution for Root Cause Localization [19.30952654225615]
Root Cause Analysis (RCA) plays an indispensable role in distributed data system maintenance and operations.
This paper opens up the possibilities of exploiting the recently developed framework of explainable AI (XAI) for the purpose of RCA.
We propose BALANCE, which formulates the problem of RCA through the lens of attribution in XAI.
arXiv Detail & Related papers (2023-01-31T11:49:26Z) - Are we certain it's anomalous? [57.729669157989235]
Anomaly detection in time series is a complex task since anomalies are rare due to highly non-linear temporal correlations.
Here we propose the novel use of Hyperbolic uncertainty for Anomaly Detection (HypAD)
HypAD learns self-supervisedly to reconstruct the input signal.
arXiv Detail & Related papers (2022-11-16T21:31:39Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - Hop-Count Based Self-Supervised Anomaly Detection on Attributed Networks [8.608288231153304]
We propose a hop-count based model (HCM) to detect anomalies by modeling both local and global contextual information.
To make better use of hop counts for anomaly identification, we propose to use hop counts prediction as a self-supervised task.
arXiv Detail & Related papers (2021-04-16T06:43:05Z) - Simple and Effective Prevention of Mode Collapse in Deep One-Class
Classification [93.2334223970488]
We propose two regularizers to prevent hypersphere collapse in deep SVDD.
The first regularizer is based on injecting random noise via the standard cross-entropy loss.
The second regularizer penalizes the minibatch variance when it becomes too small.
arXiv Detail & Related papers (2020-01-24T03:44:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.