Recommending Root-Cause and Mitigation Steps for Cloud Incidents using
Large Language Models
- URL: http://arxiv.org/abs/2301.03797v1
- Date: Tue, 10 Jan 2023 05:41:40 GMT
- Title: Recommending Root-Cause and Mitigation Steps for Cloud Incidents using
Large Language Models
- Authors: Toufique Ahmed, Supriyo Ghosh, Chetan Bansal, Thomas Zimmermann,
Xuchao Zhang, Saravan Rajmohan
- Abstract summary: On-call engineers require significant amount of domain knowledge and manual effort for root causing and mitigation of production incidents.
Recent advances in artificial intelligence has resulted in state-of-the-art large language models like GPT-3.x.
We do the first large-scale study to evaluate the effectiveness of these models for helping engineers root cause and production incidents.
- Score: 18.46643617658214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incident management for cloud services is a complex process involving several
steps and has a huge impact on both service health and developer productivity.
On-call engineers require significant amount of domain knowledge and manual
effort for root causing and mitigation of production incidents. Recent advances
in artificial intelligence has resulted in state-of-the-art large language
models like GPT-3.x (both GPT-3.0 and GPT-3.5), which have been used to solve a
variety of problems ranging from question answering to text summarization. In
this work, we do the first large-scale study to evaluate the effectiveness of
these models for helping engineers root cause and mitigate production
incidents. We do a rigorous study at Microsoft, on more than 40,000 incidents
and compare several large language models in zero-shot, fine-tuned and
multi-task setting using semantic and lexical metrics. Lastly, our human
evaluation with actual incident owners show the efficacy and future potential
of using artificial intelligence for resolving cloud incidents.
Related papers
- Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities.
Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z) - Reasoning Paths Optimization: Learning to Reason and Explore From Diverse Paths [69.39559168050923]
We introduce Reasoning Paths Optimization (RPO), which enables learning to reason and explore from diverse paths.
Our approach encourages favorable branches at each reasoning step while penalizing unfavorable ones, enhancing the model's overall problem-solving performance.
We focus on multi-step reasoning tasks, such as math word problems and science-based exam questions.
arXiv Detail & Related papers (2024-10-07T06:37:25Z) - SIaM: Self-Improving Code-Assisted Mathematical Reasoning of Large Language Models [54.78329741186446]
We propose a novel paradigm that uses a code-based critic model to guide steps including question-code data construction, quality control, and complementary evaluation.
Experiments across both in-domain and out-of-domain benchmarks in English and Chinese demonstrate the effectiveness of the proposed paradigm.
arXiv Detail & Related papers (2024-08-28T06:33:03Z) - X-lifecycle Learning for Cloud Incident Management using LLMs [18.076347758182067]
Incident management for large cloud services is a complex and tedious process.
Recent advancements in large language models [LLMs] created opportunities to automatically generate contextual recommendations.
In this paper, we demonstrate that augmenting additional contextual data from different stages of SDLC improves the performance.
arXiv Detail & Related papers (2024-02-15T06:19:02Z) - Automated Root Causing of Cloud Incidents using In-Context Learning with
GPT-4 [23.856839017006386]
Root Cause Analysis (RCA) plays a pivotal role in the incident diagnosis process for cloud services.
GPT-4 model's immense size presents challenges when trying to fine-tune it on user data.
We propose an in-context learning approach for automated root causing, which eliminates the need for fine-tuning.
arXiv Detail & Related papers (2024-01-24T21:02:07Z) - A Comparative Study of Transformer-based Neural Text Representation
Techniques on Bug Triaging [8.831760500324318]
We offer one of the first investigations that fine-tunes transformer-based language models for the task of bug triaging.
DeBERTa is the most effective technique across the triaging tasks of developer and component assignment.
arXiv Detail & Related papers (2023-10-10T18:09:32Z) - Supporting Human-AI Collaboration in Auditing LLMs with LLMs [33.56822240549913]
Large language models have been shown to be biased and behave irresponsibly.
It is crucial to audit these language models rigorously.
Existing auditing tools leverage either or both humans and AI to find failures.
arXiv Detail & Related papers (2023-04-19T21:59:04Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions.
This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z) - Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale.
Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions.
We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z) - Neural Knowledge Extraction From Cloud Service Incidents [13.86595381172654]
SoftNER is a framework for unsupervised knowledge extraction from service incidents.
We build a novel multi-task learning based BiLSTM-CRF model.
We show that the unsupervised machine learning based approach has a high precision of 0.96.
arXiv Detail & Related papers (2020-07-10T17:33:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.