Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures
- URL: http://arxiv.org/abs/2602.08804v1
- Date: Mon, 09 Feb 2026 15:41:55 GMT
- Title: Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures
- Authors: Liming Zhou, Ailing Liu, Hongwei Liu, Min He, Heng Zhang,
- Abstract summary: This paper proposes a residual-connection-based root cause analysis method using large language model (LLM)<n> Experimental results on CCF-AIOps microservice datasets demonstrate that RC-LLM achieves strong accuracy and efficiency in root cause analysis.
- Score: 13.802178274787948
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Root cause localization remain challenging in complex and large-scale microservice architectures. The complex fault propagation among microservices and the high dimensionality of telemetry data, including metrics, logs, and traces, limit the effectiveness of existing root cause analysis (RCA) methods. In this paper, a residual-connection-based RCA method using large language model (LLM), named RC-LLM, is proposed. A residual-like hierarchical fusion structure is designed to integrate multi-source telemetry data, while the contextual reasoning capability of large language models is leveraged to model temporal and cross-microservice causal dependencies. Experimental results on CCF-AIOps microservice datasets demonstrate that RC-LLM achieves strong accuracy and efficiency in root cause analysis.
Related papers
- Wireless Federated Multi-Task LLM Fine-Tuning via Sparse-and-Orthogonal LoRA [61.12136997430116]
Decentralized federated learning (DFL) based on low-rank adaptation (LoRA) enables mobile devices with multi-task datasets to collaboratively fine-tune a large language model (LLM) by exchanging locally updated parameters with a subset of neighboring devices via wireless connections for knowledge integration.<n> directly aggregating parameters fine-tuned on heterogeneous datasets induces three primary issues across the DFL life-cycle: (i) catastrophic knowledge forgetting during fine-tuning process, arising from conflicting update directions caused by data heterogeneity; (ii) textitinefficient communication and convergence during model aggregation process,
arXiv Detail & Related papers (2026-02-24T02:45:32Z) - Hypothesize-Then-Verify: Speculative Root Cause Analysis for Microservices with Pathwise Parallelism [19.31110304702373]
SpecRCA is a speculative root cause analysis framework that adopts a textithypothesize-then-verify paradigm.<n>Preliminary experiments on the AIOps 2022 demonstrate that SpecRCA achieves superior accuracy and efficiency compared to existing approaches.
arXiv Detail & Related papers (2026-01-06T05:58:25Z) - MicroRCA-Agent: Microservice Root Cause Analysis Method Based on Large Language Model Agents [12.160412894251406]
MicroRCA-Agent is an innovative solution for microservice root cause analysis based on large language model agents.<n>The proposed solution demonstrates superior performance in complex microservice fault scenarios, achieving a final score of 50.71.
arXiv Detail & Related papers (2025-09-19T05:57:03Z) - Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought [11.307072056343662]
We introduce RCLAgent, an adaptive root cause localization method for microservice systems.<n>We show that RCLAgent achieves superior performance by localizing the root cause using only a single request-outperforming state-of-the-art methods.
arXiv Detail & Related papers (2025-08-28T02:34:19Z) - Understanding the Information Propagation Effects of Communication Topologies in LLM-based Multi-Agent Systems [58.95962217043371]
We present a causal framework to analyze how agent outputs, whether correct or erroneous, propagate under topologies with varying sparsity.<n>Our empirical studies reveal that moderately sparse topologies, which effectively suppress error propagation while preserving beneficial information diffusion, typically achieve optimal task performance.<n>We propose a novel topology design approach, EIB-leanrner, that balances error suppression and beneficial information propagation by fusing connectivity patterns from both dense and sparse graphs.
arXiv Detail & Related papers (2025-05-29T11:21:48Z) - MCTS-RAG: Enhancing Retrieval-Augmented Generation with Monte Carlo Tree Search [61.11836311160951]
We introduce MCTS-RAG, a novel approach that enhances the reasoning capabilities of small language models on knowledge-intensive tasks.<n>Unlike standard RAG methods, which typically retrieve information independently from reasoning, MCTS-RAG combines structured reasoning with adaptive retrieval.<n>This integrated approach enhances decision-making, reduces hallucinations, and ensures improved factual accuracy and response consistency.
arXiv Detail & Related papers (2025-03-26T17:46:08Z) - Robust Time Series Causal Discovery for Agent-Based Model Validation [5.430532390358285]
This study proposes a Robust Cross-Validation (RCV) approach to enhance causal structure learning for ABM validation.
We develop RCV-VarLiNGAM and RCV-PCMCI, novel extensions of two prominent causal discovery algorithms.
The proposed approach is then integrated into an enhanced ABM validation framework.
arXiv Detail & Related papers (2024-10-25T09:13:26Z) - Online Multi-modal Root Cause Analysis [61.94987309148539]
Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems.
Existing online RCA methods handle only single-modal data overlooking, complex interactions in multi-modal systems.
We introduce OCEAN, a novel online multi-modal causal structure learning method for root cause localization.
arXiv Detail & Related papers (2024-10-13T21:47:36Z) - R-SFLLM: Jamming Resilient Framework for Split Federated Learning with Large Language Models [65.04475956174959]
Split federated learning (SFL) is a compute-efficient paradigm in distributed machine learning (ML)<n>A significant challenge in SFL, particularly when deployed over wireless channels, is the susceptibility of transmitted model parameters to adversarial jamming.<n>This paper develops a physical layer framework for resilient SFL with large language models (LLMs) and vision language models (VLMs) over wireless networks.
arXiv Detail & Related papers (2024-07-16T12:21:29Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - Multi-modal Causal Structure Learning and Root Cause Analysis [67.67578590390907]
We propose Mulan, a unified multi-modal causal structure learning method for root cause localization.
We leverage a log-tailored language model to facilitate log representation learning, converting log sequences into time-series data.
We also introduce a novel key performance indicator-aware attention mechanism for assessing modality reliability and co-learning a final causal graph.
arXiv Detail & Related papers (2024-02-04T05:50:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.