Recommending Root-Cause and Mitigation Steps for Cloud Incidents using
Large Language Models
- URL: http://arxiv.org/abs/2301.03797v1
- Date: Tue, 10 Jan 2023 05:41:40 GMT
- Title: Recommending Root-Cause and Mitigation Steps for Cloud Incidents using
Large Language Models
- Authors: Toufique Ahmed, Supriyo Ghosh, Chetan Bansal, Thomas Zimmermann,
Xuchao Zhang, Saravan Rajmohan
- Abstract summary: On-call engineers require significant amount of domain knowledge and manual effort for root causing and mitigation of production incidents.
Recent advances in artificial intelligence has resulted in state-of-the-art large language models like GPT-3.x.
We do the first large-scale study to evaluate the effectiveness of these models for helping engineers root cause and production incidents.
- Score: 18.46643617658214
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incident management for cloud services is a complex process involving several
steps and has a huge impact on both service health and developer productivity.
On-call engineers require significant amount of domain knowledge and manual
effort for root causing and mitigation of production incidents. Recent advances
in artificial intelligence has resulted in state-of-the-art large language
models like GPT-3.x (both GPT-3.0 and GPT-3.5), which have been used to solve a
variety of problems ranging from question answering to text summarization. In
this work, we do the first large-scale study to evaluate the effectiveness of
these models for helping engineers root cause and mitigate production
incidents. We do a rigorous study at Microsoft, on more than 40,000 incidents
and compare several large language models in zero-shot, fine-tuned and
multi-task setting using semantic and lexical metrics. Lastly, our human
evaluation with actual incident owners show the efficacy and future potential
of using artificial intelligence for resolving cloud incidents.
Related papers
- X-lifecycle Learning for Cloud Incident Management using LLMs [18.076347758182067]
Incident management for large cloud services is a complex and tedious process.
Recent advancements in large language models [LLMs] created opportunities to automatically generate contextual recommendations.
In this paper, we demonstrate that augmenting additional contextual data from different stages of SDLC improves the performance.
arXiv Detail & Related papers (2024-02-15T06:19:02Z) - Automated Root Causing of Cloud Incidents using In-Context Learning with
GPT-4 [23.856839017006386]
Root Cause Analysis (RCA) plays a pivotal role in the incident diagnosis process for cloud services.
GPT-4 model's immense size presents challenges when trying to fine-tune it on user data.
We propose an in-context learning approach for automated root causing, which eliminates the need for fine-tuning.
arXiv Detail & Related papers (2024-01-24T21:02:07Z) - A Comparative Study of Transformer-based Neural Text Representation
Techniques on Bug Triaging [8.831760500324318]
We offer one of the first investigations that fine-tunes transformer-based language models for the task of bug triaging.
DeBERTa is the most effective technique across the triaging tasks of developer and component assignment.
arXiv Detail & Related papers (2023-10-10T18:09:32Z) - Supporting Human-AI Collaboration in Auditing LLMs with LLMs [33.56822240549913]
Large language models have been shown to be biased and behave irresponsibly.
It is crucial to audit these language models rigorously.
Existing auditing tools leverage either or both humans and AI to find failures.
arXiv Detail & Related papers (2023-04-19T21:59:04Z) - Large Language Models in the Workplace: A Case Study on Prompt
Engineering for Job Type Classification [58.720142291102135]
This case study investigates the task of job classification in a real-world setting.
The goal is to determine whether an English-language job posting is appropriate for a graduate or entry-level position.
arXiv Detail & Related papers (2023-03-13T14:09:53Z) - A Causal Framework to Quantify the Robustness of Mathematical Reasoning
with Language Models [81.15974174627785]
We study the behavior of language models in terms of robustness and sensitivity to direct interventions in the input space.
Our analysis shows that robustness does not appear to continuously improve as a function of size, but the GPT-3 Davinci models (175B) achieve a dramatic improvement in both robustness and sensitivity compared to all other GPT variants.
arXiv Detail & Related papers (2022-10-21T15:12:37Z) - Learnware: Small Models Do Big [69.88234743773113]
The prevailing big model paradigm, which has achieved impressive results in natural language processing and computer vision applications, has not yet addressed those issues, whereas becoming a serious source of carbon emissions.
This article offers an overview of the learnware paradigm, which attempts to enable users not need to build machine learning models from scratch, with the hope of reusing small models to do things even beyond their original purposes.
arXiv Detail & Related papers (2022-10-07T15:55:52Z) - Beyond the Imitation Game: Quantifying and extrapolating the
capabilities of language models [648.3665819567409]
Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale.
Big-bench consists of 204 tasks, contributed by 450 authors across 132 institutions.
We evaluate the behavior of OpenAI's GPT models, Google-internal dense transformer architectures, and Switch-style sparse transformers on BIG-bench.
arXiv Detail & Related papers (2022-06-09T17:05:34Z) - Knowledge-Aware Procedural Text Understanding with Multi-Stage Training [110.93934567725826]
We focus on the task of procedural text understanding, which aims to comprehend such documents and track entities' states and locations during a process.
Two challenges, the difficulty of commonsense reasoning and data insufficiency, still remain unsolved.
We propose a novel KnOwledge-Aware proceduraL text understAnding (KOALA) model, which effectively leverages multiple forms of external knowledge.
arXiv Detail & Related papers (2020-09-28T10:28:40Z) - Neural Knowledge Extraction From Cloud Service Incidents [13.86595381172654]
SoftNER is a framework for unsupervised knowledge extraction from service incidents.
We build a novel multi-task learning based BiLSTM-CRF model.
We show that the unsupervised machine learning based approach has a high precision of 0.96.
arXiv Detail & Related papers (2020-07-10T17:33:07Z) - Language Models are Few-Shot Learners [61.36677350504291]
We show that scaling up language models greatly improves task-agnostic, few-shot performance.
We train GPT-3, an autoregressive language model with 175 billion parameters, and test its performance in the few-shot setting.
GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks.
arXiv Detail & Related papers (2020-05-28T17:29:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.