Fugu-MT 論文翻訳(概要): An Empirical Study of SOTA RCA Models: From Oversimplified Benchmarks to Realistic Failures

論文の概要: An Empirical Study of SOTA RCA Models: From Oversimplified Benchmarks to Realistic Failures

arxiv url: http://arxiv.org/abs/2510.04711v1
Date: Mon, 06 Oct 2025 11:30:03 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-07 16:52:59.827814
Title: An Empirical Study of SOTA RCA Models: From Oversimplified Benchmarks to Realistic Failures
Title（参考訳）: SOTA RCAモデルに関する実証的研究:過度に単純化されたベンチマークから現実的失敗へ
Authors: Aoyang Fang, Songhan Zhang, Yifan Yang, Haotong Wu, Junjielong Xu, Xuyang Wang, Rui Wang, Manyi Wang, Qisheng Lu, Pinjia He,
Abstract要約: 我々は,4つの広く使用されているベンチマークにおいて,単純なルールベースの手法が最先端(SOTA)モデルに適合するか,あるいは性能に優れていることを示す。私たちの分析では、スケーラビリティの問題、可観測性の盲点、モデリングボトルネックの3つの一般的な障害パターンを強調しています。
参考スコア（独自算出の注目度）: 16.06503310632004
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: While cloud-native microservice architectures have transformed software development, their complexity makes Root Cause Analysis (RCA) both crucial and challenging. Although many data-driven RCA models have been proposed, we find that existing benchmarks are often oversimplified and fail to capture real-world conditions. Our preliminary study shows that simple rule-based methods can match or even outperform state-of-the-art (SOTA) models on four widely used benchmarks, suggesting performance overestimation due to benchmark simplicity. To address this, we systematically analyze popular RCA benchmarks and identify key limitations in fault injection, call graph design, and telemetry patterns. Based on these insights, we develop an automated framework to generate more realistic benchmarks, yielding a dataset of 1,430 validated failure cases from 9,152 injections, covering 25 fault types under dynamic workloads with hierarchical ground-truth labels and verified SLI impact. Re-evaluation of 11 SOTA models on this dataset shows low Top@1 accuracy (average 0.21, best 0.37) and significantly longer execution times. Our analysis highlights three common failure patterns: scalability issues, observability blind spots, and modeling bottlenecks.
Abstract（参考訳）: クラウドネイティブなマイクロサービスアーキテクチャはソフトウェア開発を変革しているが、その複雑さはRoot Cause Analysis(RCA)を重要かつ困難なものにしている。データ駆動型RCAモデルが数多く提案されているが、既存のベンチマークは過度に単純化され、現実の状態を捉えることができないことが多い。予備的な研究では、単純なルールベースの手法は、広く使用されている4つのベンチマークにおいて、最先端(SOTA)モデルにマッチするか、あるいは性能を上回り得ることを示し、ベンチマークの単純さによる性能過大評価を示唆している。そこで我々は,一般的なRCAベンチマークを体系的に解析し,障害注入,コールグラフ設計,テレメトリパターンの鍵となる限界を同定する。これらの知見に基づいて、我々は、より現実的なベンチマークを生成する自動化されたフレームワークを開発し、9,152インジェクションから1430の検証済み障害ケースのデータセットを生成し、階層的な基盤構造ラベルとSLIの影響を伴って、動的ワークロード下で25の障害タイプをカバーした。このデータセット上での11のSOTAモデルの再評価では、Top@1の精度が低く(平均0.21、ベスト0.37)、実行時間が大幅に長い。私たちの分析では、スケーラビリティの問題、可観測性の盲点、モデリングボトルネックの3つの一般的な障害パターンを強調しています。

論文の概要: An Empirical Study of SOTA RCA Models: From Oversimplified Benchmarks to Realistic Failures

関連論文リスト