RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models
- URL: http://arxiv.org/abs/2310.16340v3
- Date: Fri, 2 Aug 2024 01:41:38 GMT
- Title: RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models
- Authors: Zefan Wang, Zichuan Liu, Yingying Zhang, Aoxiao Zhong, Jihong Wang, Fengbin Yin, Lunting Fan, Lingfei Wu, Qingsong Wen,
- Abstract summary: Large language model (LLM) applications in cloud root cause analysis (RCA) have been actively explored recently.
We present RCAgent, a tool-augmented LLM autonomous agent framework for practical and privacy-aware industrial RCA usage.
Running on an internally deployed model rather than GPT families, RCAgent is capable of free-form data collection and comprehensive analysis with tools.
- Score: 46.476439550746136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) applications in cloud root cause analysis (RCA) have been actively explored recently. However, current methods are still reliant on manual workflow settings and do not unleash LLMs' decision-making and environment interaction capabilities. We present RCAgent, a tool-augmented LLM autonomous agent framework for practical and privacy-aware industrial RCA usage. Running on an internally deployed model rather than GPT families, RCAgent is capable of free-form data collection and comprehensive analysis with tools. Our framework combines a variety of enhancements, including a unique Self-Consistency for action trajectories, and a suite of methods for context management, stabilization, and importing domain knowledge. Our experiments show RCAgent's evident and consistent superiority over ReAct across all aspects of RCA -- predicting root causes, solutions, evidence, and responsibilities -- and tasks covered or uncovered by current rules, as validated by both automated metrics and human evaluations. Furthermore, RCAgent has already been integrated into the diagnosis and issue discovery workflow of the Real-time Compute Platform for Apache Flink of Alibaba Cloud.
Related papers
- XAI-based Feature Ensemble for Enhanced Anomaly Detection in Autonomous Driving Systems [1.3022753212679383]
This paper proposes a novel feature ensemble framework that integrates multiple Explainable AI (XAI) methods.
By fusing top features identified by these XAI methods across six diverse AI models, the framework creates a robust and comprehensive set of features critical for detecting anomalies.
Our technique demonstrates improved accuracy, robustness, and transparency of AI models, contributing to safer and more trustworthy autonomous driving systems.
arXiv Detail & Related papers (2024-10-20T14:34:48Z) - Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents [44.34340798542]
Large Language Models (LLMs) have shown remarkable capabilities in natural language tasks requiring complex reasoning.
Traditional supervised pre-training on static datasets falls short in enabling autonomous agent capabilities.
We propose a framework that combines guided Monte Carlo Tree Search (MCTS) search with a self-critique mechanism and iterative fine-tuning on agent interactions.
arXiv Detail & Related papers (2024-08-13T20:52:13Z) - Exploring LLM-based Agents for Root Cause Analysis [17.053079105858497]
Root cause analysis (RCA) is a critical part of the incident management process.
Large Language Models (LLMs) have been used to perform RCA, but are not able to collect additional diagnostic information.
We present an evaluation of a ReAct agent equipped with retrieval tools, on an out-of-distribution dataset of production incidents collected at Microsoft.
arXiv Detail & Related papers (2024-03-07T00:44:01Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - PyRCA: A Library for Metric-based Root Cause Analysis [66.72542200701807]
PyRCA is an open-source machine learning library of Root Cause Analysis (RCA) for Artificial Intelligence for IT Operations (AIOps)
It provides a holistic framework to uncover the complicated metric causal dependencies and automatically locate root causes of incidents.
arXiv Detail & Related papers (2023-06-20T09:55:10Z) - Automated Machine Learning, Bounded Rationality, and Rational
Metareasoning [62.997667081978825]
We will look at automated machine learning (AutoML) and related problems from the perspective of bounded rationality.
Taking actions under bounded resources requires an agent to reflect on how to use these resources in an optimal way.
arXiv Detail & Related papers (2021-09-10T09:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.