RiskLoc: Localization of Multi-dimensional Root Causes by Weighted Risk
- URL: http://arxiv.org/abs/2205.10004v1
- Date: Fri, 20 May 2022 07:43:18 GMT
- Title: RiskLoc: Localization of Multi-dimensional Root Causes by Weighted Risk
- Authors: Marcus Kalander
- Abstract summary: Failures and anomalies in large-scale software systems are unavoidable incidents.
Operators need to quickly and correctly identify its location to facilitate a swift repair.
We propose RiskLoc to solve the problem of multidimensional root cause localization.
- Score: 1.2691047660244335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Failures and anomalies in large-scale software systems are unavoidable
incidents. When an issue is detected, operators need to quickly and correctly
identify its location to facilitate a swift repair. In this work, we consider
the problem of identifying the root cause set that best explains an anomaly in
multi-dimensional time series with categorical attributes. The huge search
space is the main challenge, even for a small number of attributes and small
value sets, the number of theoretical combinations is too large to brute force.
Previous approaches have thus focused on reducing the search space, but they
all suffer from various issues, requiring extensive manual parameter tuning,
being too slow and thus impractical, or being incapable of finding more complex
root causes. We propose RiskLoc to solve the problem of multidimensional root
cause localization. RiskLoc applies a 2-way partitioning scheme and assigns
element weights that linearly increase with the distance from the partitioning
point. A risk score is assigned to each element that integrates two factors, 1)
its weighted proportion within the abnormal partition, and 2) the relative
change in the deviation score adjusted for the ripple effect property.
Extensive experiments on multiple datasets verify the effectiveness and
efficiency of RiskLoc, and for a comprehensive evaluation, we introduce three
synthetically generated datasets that complement existing datasets. We
demonstrate that RiskLoc consistently outperforms state-of-the-art baselines,
especially in more challenging root cause scenarios, with gains in F1-score up
to 57% over the second-best approach with comparable running times.
Related papers
- Decomposing and Composing: Towards Efficient Vision-Language Continual Learning via Rank-1 Expert Pool in a Single LoRA [50.97792275353563]
We introduce a novel framework that restructures a single Low-Rank Adaptation (LoRA) module as a decomposable Rank-1 Expert Pool.<n>Our method learns to dynamically compose a sparse, task-specific update by selecting from this expert pool, guided by the semantics of the [Guided] token.
arXiv Detail & Related papers (2026-01-30T10:54:51Z) - MultiRisk: Multiple Risk Control via Iterative Score Thresholding [40.193623095603265]
We formalize the problem of enforcing multiple risk constraints with user-defined priorities.<n>We introduce two efficient dynamic programming algorithms that leverage this sequential structure.<n>We show that our algorithm can control each individual risk at close to the target level.
arXiv Detail & Related papers (2025-12-31T03:25:30Z) - Robust Root Cause Diagnosis using In-Distribution Interventions [31.19149413954674]
Diagnosing the root cause of an anomaly in a complex interconnected system is a pressing problem in today's cloud services and industrial operations.<n>We propose In-Distribution Interventions (IDI), a novel algorithm that predicts root cause as nodes that meet two criteria.
arXiv Detail & Related papers (2025-05-02T00:19:43Z) - Is Parameter Collision Hindering Continual Learning in LLMs? [50.57658782050275]
Large Language Models (LLMs) often suffer from catastrophic forgetting when learning multiple tasks sequentially.
We show that building non-collision parameters is a more critical interdependence factor in addressing CL challenges.
We propose Non-collision Low-Rank Adaptation (N-LoRA), a simple yet effective approach leveraging low collision rates to enhance CL in LLMs.
arXiv Detail & Related papers (2024-10-14T05:54:11Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z) - Unraveling the "Anomaly" in Time Series Anomaly Detection: A
Self-supervised Tri-domain Solution [89.16750999704969]
Anomaly labels hinder traditional supervised models in time series anomaly detection.
Various SOTA deep learning techniques, such as self-supervised learning, have been introduced to tackle this issue.
We propose a novel self-supervised learning based Tri-domain Anomaly Detector (TriAD)
arXiv Detail & Related papers (2023-11-19T05:37:18Z) - Generic and Robust Root Cause Localization for Multi-Dimensional Data in
Online Service Systems [22.308016571592105]
Localizing root causes for multi-dimensional data is critical to ensure online service systems' reliability.
This paper proposes a generic and robust root cause localization approach for multi-dimensional data, PSqueeze.
Case studies in several production systems demonstrate that PSqueeze is helpful to fault diagnosis in the real world.
arXiv Detail & Related papers (2023-05-05T07:22:30Z) - Causality-Based Multivariate Time Series Anomaly Detection [63.799474860969156]
We formulate the anomaly detection problem from a causal perspective and view anomalies as instances that do not follow the regular causal mechanism to generate the multivariate data.
We then propose a causality-based anomaly detection approach, which first learns the causal structure from data and then infers whether an instance is an anomaly relative to the local causal mechanism.
We evaluate our approach with both simulated and public datasets as well as a case study on real-world AIOps applications.
arXiv Detail & Related papers (2022-06-30T06:00:13Z) - Deep Hierarchy in Bandits [51.22833900944146]
Mean rewards of actions are often correlated.
To maximize statistical efficiency, it is important to leverage these correlations when learning.
We formulate a bandit variant of this problem where the correlations of mean action rewards are represented by a hierarchical Bayesian model.
arXiv Detail & Related papers (2022-02-03T08:15:53Z) - Causal Discovery from Sparse Time-Series Data Using Echo State Network [0.0]
Causal discovery between collections of time-series data can help diagnose causes of symptoms and hopefully prevent faults before they occur.
We propose a new system comprised of two parts, the first part fills missing data with a Gaussian Process Regression, and the second part leverages an Echo State Network.
We report on their corresponding Matthews Correlation Coefficient(MCC) and Receiver Operating Characteristic curves (ROC) and show that the proposed system outperforms existing algorithms.
arXiv Detail & Related papers (2022-01-09T05:55:47Z) - DeepFIB: Self-Imputation for Time Series Anomaly Detection [5.4921159672644775]
Time series anomaly detection (AD) plays an essential role in various applications, e.g., fraud detection in finance and healthcare monitoring.
We propose a novel self-supervised learning technique for AD in time series, namely emphDeepFIB.
We show that DeepFIB outperforms state-of-the-art methods by a large margin, achieving up to $65.2%$ relative improvement in F1-score.
arXiv Detail & Related papers (2021-12-12T14:28:06Z) - Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring.
Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains.
The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.