Fugu-MT 論文翻訳(概要): Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought

論文の概要: Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought

arxiv url: http://arxiv.org/abs/2508.20370v1
Date: Thu, 28 Aug 2025 02:34:19 GMT
ステータス: 翻訳完了
システム内更新日: 2025-08-29 18:12:01.906526
Title: Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought
Title（参考訳）: マルチエージェント再帰型マイクロサービスシステムにおける適応根の局在化
Authors: Lingzhe Zhang, Tong Jia, Kangjin Wang, Weijie Hong, Chiming Duan, Minghua He, Ying Li,
Abstract要約: 本稿では,マイクロサービスシステムに対する適応的根本原因ローカライゼーション手法であるRCLAgentを紹介する。 RCLAgentは,1つの要求出力状態のみを用いて根本原因を局所化し,優れた性能を発揮することを示す。
参考スコア（独自算出の注目度）: 11.307072056343662
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As contemporary microservice systems become increasingly popular and complex-often comprising hundreds or even thousands of fine-grained, interdependent subsystems-they are facing more frequent failures. Ensuring system reliability thus demands accurate root cause localization. While traces and metrics have proven to be effective data sources for this task, existing methods either heavily rely on pre-defined schemas, which struggle to adapt to evolving operational contexts, or lack interpretability in their reasoning process, thereby leaving Site Reliability Engineers (SREs) confused. In this paper, we conduct a comprehensive study on how SREs localize the root cause of failures, drawing insights from multiple professional SREs across different organizations. Our investigation reveals that human root cause analysis exhibits three key characteristics: recursiveness, multi-dimensional expansion, and cross-modal reasoning. Motivated by these findings, we introduce RCLAgent, an adaptive root cause localization method for microservice systems that leverages a multi-agent recursion-of-thought framework. RCLAgent employs a novel recursion-of-thought strategy to guide the LLM's reasoning process, effectively integrating data from multiple agents and tool-assisted analysis to accurately pinpoint the root cause. Experimental evaluations on various public datasets demonstrate that RCLAgent achieves superior performance by localizing the root cause using only a single request-outperforming state-of-the-art methods that depend on aggregating multiple requests. These results underscore the effectiveness of RCLAgent in enhancing the efficiency and precision of root cause localization in complex microservice environments.
Abstract（参考訳）: 現代のマイクロサービスシステムは、何百、何千ものきめ細かい、相互依存のサブシステムが、より頻繁に障害に直面しているため、ますます人気を増し、複雑化する。システムの信頼性を確保するために、正確な根本原因のローカライゼーションが要求される。トレースとメトリクスは、このタスクに有効なデータソースであることが証明されているが、既存のメソッドは、進化する運用コンテキストに適応するのに苦労する事前定義されたスキーマや、推論プロセスにおける解釈可能性の欠如に大きく依存しているため、SRE(Site Reliability Engineers)は混乱している。本稿では,SREが障害の根本原因をローカライズする方法を包括的に研究し,さまざまな組織にまたがる複数のプロフェッショナルSREから洞察を引き出す。本研究は,人間の根本原因分析が再帰性,多次元展開,モーダル間推論の3つの重要な特徴を示すことを示す。これらの知見に触発されて,マルチエージェント・オブ・シントフレームワークを活用したマイクロサービスシステムの適応的根本原因ローカライズ手法であるRCLAgentを紹介した。 RCLAgentは、LLMの推論プロセスを導くために、新しい再帰戦略を採用し、複数のエージェントからのデータを効果的に統合し、ツール支援分析によって根本原因を正確に特定する。 RCLAgentは,複数要求の集約に依存する単一要求出力方式のみを用いて,根本原因をローカライズすることで,優れた性能を実現することを示す。これらの結果は、複雑なマイクロサービス環境における根本原因の局在化の効率性と精度を高めるRCLAgentの有効性を裏付けるものである。

論文の概要: Adaptive Root Cause Localization for Microservice Systems with Multi-Agent Recursion-of-Thought

関連論文リスト