Related papers: Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach

Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach

URL: http://arxiv.org/abs/2403.06485v1
Date: Mon, 11 Mar 2024 07:48:35 GMT
Title: Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach
Authors: Jinxi Kuang, Jinyang Liu, Junjie Huang, Renyi Zhong, Jiazhen Gu, Lan Yu, Rui Tan, Zengyin Yang, Michael R. Lyu
Abstract summary: COLA is a novel hybrid approach based on correlation mining and LLM (Large Language Model) reasoning for online alert aggregation. We evaluate COLA on three datasets collected from the production environment of a large-scale cloud platform. Results show COLA achieves F1-scores from 0.901 to 0.930, outperforming state-of-the-art methods and achieving comparable efficiency.
Score: 28.71225642605041
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Due to the scale and complexity of cloud systems, a system failure would trigger an "alert storm", i.e., massive correlated alerts. Although these alerts can be traced back to a few root causes, the overwhelming number makes it infeasible for manual handling. Alert aggregation is thus critical to help engineers concentrate on the root cause and facilitate failure resolution. Existing methods typically utilize semantic similarity-based methods or statistical methods to aggregate alerts. However, semantic similarity-based methods overlook the causal rationale of alerts, while statistical methods can hardly handle infrequent alerts. To tackle these limitations, we introduce leveraging external knowledge, i.e., Standard Operation Procedure (SOP) of alerts as a supplement. We propose COLA, a novel hybrid approach based on correlation mining and LLM (Large Language Model) reasoning for online alert aggregation. The correlation mining module effectively captures the temporal and spatial relations between alerts, measuring their correlations in an efficient manner. Subsequently, only uncertain pairs with low confidence are forwarded to the LLM reasoning module for detailed analysis. This hybrid design harnesses both statistical evidence for frequent alerts and the reasoning capabilities of computationally intensive LLMs, ensuring the overall efficiency of COLA in handling large volumes of alerts in practical scenarios. We evaluate COLA on three datasets collected from the production environment of a large-scale cloud platform. The experimental results show COLA achieves F1-scores from 0.901 to 0.930, outperforming state-of-the-art methods and achieving comparable efficiency. We also share our experience in deploying COLA in our real-world cloud system, Cloud X.

Related papers

R-Sparse: Rank-Aware Activation Sparsity for Efficient LLM Inference [77.47238561728459]
R-Sparse is a training-free activation sparsity approach capable of achieving high sparsity levels in advanced LLMs. Experiments on Llama-2/3 and Mistral models across ten diverse tasks demonstrate that R-Sparse achieves comparable performance at 50% model-level sparsity.
arXiv Detail & Related papers (2025-04-28T03:30:32Z)
Labeling NIDS Rules with MITRE ATT&CK Techniques: Machine Learning vs. Large Language Models [4.440432588828829]
Large Language Models (LLMs) may be a promising technology to reduce the alert explainability gap by associating rules with attack techniques. In this paper, we investigate the ability of three prominent LLMs to reason about NIDS rules while labeling them with MITRE ATT&CK tactics and techniques. Our results indicate that while LLMs provide explainable, scalable, and efficient initial mappings, traditional Machine Learning (ML) models consistently outperform them in accuracy, achieving higher precision, recall, and F1-scores.
arXiv Detail & Related papers (2024-12-14T21:52:35Z)
Attention Tracker: Detecting Prompt Injection Attacks in LLMs [62.247841717696765]
Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks. We introduce the concept of the distraction effect, where specific attention heads shift focus from the original instruction to the injected instruction. We propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks.
arXiv Detail & Related papers (2024-11-01T04:05:59Z)
LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks [52.46420522934253]
We introduce LoRA-Ensemble, a parameter-efficient deep ensemble method for self-attention networks. By employing a single pre-trained self-attention network with weights shared across all members, we train member-specific low-rank matrices for the attention projections. Our method exhibits superior calibration compared to explicit ensembles and achieves similar or better accuracy across various prediction tasks and datasets.
arXiv Detail & Related papers (2024-05-23T11:10:32Z)
FedRDF: A Robust and Dynamic Aggregation Function against Poisoning Attacks in Federated Learning [0.0]
Federated Learning (FL) represents a promising approach to typical privacy concerns associated with centralized Machine Learning (ML) deployments. Despite its well-known advantages, FL is vulnerable to security attacks such as Byzantine behaviors and poisoning attacks. Our proposed approach was tested against various model poisoning attacks, demonstrating superior performance over state-of-the-art aggregation methods.
arXiv Detail & Related papers (2024-02-15T16:42:04Z)
Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization. We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data. We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z)
Privacy-Preserving Distributed Learning for Residential Short-Term Load Forecasting [11.185176107646956]
Power system load data can inadvertently reveal the daily routines of residential users, posing a risk to their property security. We introduce a Markovian Switching-based distributed training framework, the convergence of which is substantiated through rigorous theoretical analysis. Case studies employing real-world power system load data validate the efficacy of our proposed algorithm.
arXiv Detail & Related papers (2024-02-02T16:39:08Z)
FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks. FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain. We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z)
A Hierarchical Security Events Correlation Model for Real-time Cyber Threat Detection and Response [0.0]
We develop a novel hierarchical event correlation model that promises to reduce the number of alerts issued by an Intrusion Detection System. The proposed model takes the best of features from similarity and graph-based correlation techniques to deliver an ensemble capability not possible by either approach separately. The model is implemented as a proof of concept with experiments run on the DARPA 99 Intrusion detection set.
arXiv Detail & Related papers (2023-12-02T20:07:40Z)
Over-the-Air Federated Learning and Optimization [52.5188988624998]
We focus on Federated learning (FL) via edge-the-air computation (AirComp) We describe the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both convex and non- convex settings. For different types of local updates that can be transmitted by edge devices (i.e., model, gradient, model difference), we reveal that transmitting in AirFedAvg may cause an aggregation error. In addition, we consider more practical signal processing schemes to improve the communication efficiency and extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes.
arXiv Detail & Related papers (2023-10-16T05:49:28Z)
PACE-LM: Prompting and Augmentation for Calibrated Confidence Estimation with GPT-4 in Cloud Incident Root Cause Analysis [17.362895895214344]
Large language models (LLMs) are used to help humans identify the root causes of cloud incidents. We propose to perform confidence estimation for the predictions to help on-call engineers make decisions on whether to adopt the model prediction. We show that our method is able to produce calibrated confidence estimates for predicted root causes, validate the usefulness of retrieved historical data and the prompting strategy.
arXiv Detail & Related papers (2023-09-11T21:24:00Z)
Inter-Domain Fusion for Enhanced Intrusion Detection in Power Systems: An Evidence Theoretic and Meta-Heuristic Approach [0.0]
False alerts due to/ compromised IDS in ICS networks can lead to severe economic and operational damage. This work presents an approach for reducing false alerts in CPS power systems by dealing with uncertainty without prior distribution of alerts.
arXiv Detail & Related papers (2021-11-20T00:05:39Z)
DEALIO: Data-Efficient Adversarial Learning for Imitation from Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator. Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms. This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk. We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.