RCAgent: Cloud Root Cause Analysis by Autonomous Agents with
Tool-Augmented Large Language Models
- URL: http://arxiv.org/abs/2310.16340v1
- Date: Wed, 25 Oct 2023 03:53:31 GMT
- Title: RCAgent: Cloud Root Cause Analysis by Autonomous Agents with
Tool-Augmented Large Language Models
- Authors: Zefan Wang, Zichuan Liu, Yingying Zhang, Aoxiao Zhong, Lunting Fan,
Lingfei Wu, Qingsong Wen
- Abstract summary: Large language model (LLM) applications in cloud root cause analysis (RCA) have been actively explored recently.
We present RCAgent, a tool-augmented LLM autonomous agent framework for practical and privacy-aware industrial RCA usage.
Running on an internally deployed model rather than GPT families, RCAgent is capable of free-form data collection and comprehensive analysis with tools.
- Score: 52.352418867917194
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large language model (LLM) applications in cloud root cause analysis (RCA)
have been actively explored recently. However, current methods are still
reliant on manual workflow settings and do not unleash LLMs' decision-making
and environment interaction capabilities. We present RCAgent, a tool-augmented
LLM autonomous agent framework for practical and privacy-aware industrial RCA
usage. Running on an internally deployed model rather than GPT families,
RCAgent is capable of free-form data collection and comprehensive analysis with
tools. Our framework combines a variety of enhancements, including a unique
Self-Consistency for action trajectories, and a suite of methods for context
management, stabilization, and importing domain knowledge. Our experiments show
RCAgent's evident and consistent superiority over ReAct across all aspects of
RCA -- predicting root causes, solutions, evidence, and responsibilities -- and
tasks covered or uncovered by current rules, as validated by both automated
metrics and human evaluations. Furthermore, RCAgent has already been integrated
into the diagnosis and issue discovery workflow of the Real-time Compute
Platform for Apache Flink of Alibaba Cloud.
Related papers
- Using Petri Nets as an Integrated Constraint Mechanism for Reinforcement Learning Tasks [3.105112058253643]
Lack of trust in algorithms is usually an issue when using Reinforcement Learning (RL) agents for control in real-world domains.
We propose an approach that uses Petri nets (PNs) with three main advantages over typical RL approaches.
arXiv Detail & Related papers (2024-07-05T13:04:06Z) - Exploring LLM-based Agents for Root Cause Analysis [17.053079105858497]
Root cause analysis (RCA) is a critical part of the incident management process.
Large Language Models (LLMs) have been used to perform RCA, but are not able to collect additional diagnostic information.
We present an evaluation of a ReAct agent equipped with retrieval tools, on an out-of-distribution dataset of production incidents collected at Microsoft.
arXiv Detail & Related papers (2024-03-07T00:44:01Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - Interactive Autonomous Navigation with Internal State Inference and
Interactivity Estimation [58.21683603243387]
We propose three auxiliary tasks with relational-temporal reasoning and integrate them into the standard Deep Learning framework.
These auxiliary tasks provide additional supervision signals to infer the behavior patterns other interactive agents.
Our approach achieves robust and state-of-the-art performance in terms of standard evaluation metrics.
arXiv Detail & Related papers (2023-11-27T18:57:42Z) - PyRCA: A Library for Metric-based Root Cause Analysis [66.72542200701807]
PyRCA is an open-source machine learning library of Root Cause Analysis (RCA) for Artificial Intelligence for IT Operations (AIOps)
It provides a holistic framework to uncover the complicated metric causal dependencies and automatically locate root causes of incidents.
arXiv Detail & Related papers (2023-06-20T09:55:10Z) - On Continuous Integration / Continuous Delivery for Automated Deployment
of Machine Learning Models using MLOps [1.2885809002769633]
This research provides a more in-depth look at the machine learning lifecycle and the key distinctions between DevOps and MLOps.
In the MLOps approach, we discuss tools and approaches for executing the CI/CD pipeline of machine learning frameworks.
Following that, we take a deep look into push and pull-based deployments in Github Operations (GitOps)
arXiv Detail & Related papers (2022-02-07T22:04:38Z) - Automated Machine Learning, Bounded Rationality, and Rational
Metareasoning [62.997667081978825]
We will look at automated machine learning (AutoML) and related problems from the perspective of bounded rationality.
Taking actions under bounded resources requires an agent to reflect on how to use these resources in an optimal way.
arXiv Detail & Related papers (2021-09-10T09:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.