AutoTSG: Learning and Synthesis for Incident Troubleshooting
- URL: http://arxiv.org/abs/2205.13457v1
- Date: Thu, 26 May 2022 16:05:11 GMT
- Title: AutoTSG: Learning and Synthesis for Incident Troubleshooting
- Authors: Manish Shetty, Chetan Bansal, Sai Pramod Upadhyayula, Arjun
Radhakrishna, Anurag Gupta
- Abstract summary: We conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of incidents.
We find that TSGs are widely used and help significantly reduce mitigation efforts.
We propose AutoTSG -- a novel framework for automation of TSGs executable by combining machine learning and program synthesis.
- Score: 6.297939852772734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incident management is a key aspect of operating large-scale cloud services.
To aid with faster and efficient resolution of incidents, engineering teams
document frequent troubleshooting steps in the form of Troubleshooting Guides
(TSGs), to be used by on-call engineers (OCEs). However, TSGs are siloed,
unstructured, and often incomplete, requiring developers to manually understand
and execute necessary steps. This results in a plethora of issues such as
on-call fatigue, reduced productivity, and human errors. In this work, we
conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of
incidents and find that TSGs are widely used and help significantly reduce
mitigation efforts. We then analyze feedback on TSGs provided by 400+ OCEs and
propose a taxonomy of issues that highlights significant gaps in TSG quality.
To alleviate these gaps, we investigate the automation of TSGs and propose
AutoTSG -- a novel framework for automation of TSGs to executable workflows by
combining machine learning and program synthesis. Our evaluation of AutoTSG on
50 TSGs shows the effectiveness in both identifying TSG statements (accuracy
0.89) and parsing them for execution (precision 0.94 and recall 0.91). Lastly,
we survey ten Microsoft engineers and show the importance of TSG automation and
the usefulness of AutoTSG.
Related papers
- AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities.
Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z) - GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making [5.254038213371586]
Large Language Models (LLMs) present a promising solution to these challenges.
GoNoGo is designed to streamline automotive software deployment while meeting both functional requirements and practical industrial constraints.
GoNoGo achieves a 100% success rate for tasks up to Level 2 difficulty with 3-shot examples, and maintains high performance even for more complex tasks.
arXiv Detail & Related papers (2024-08-19T08:22:20Z) - A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements [51.54559117314768]
Task And Motion Planning (TAMP) is the problem of finding a solution to an automated planning problem.
We propose a general and open-source framework for modeling and benchmarking TAMP problems.
We introduce an innovative meta-technique to solve TAMP problems involving moving agents and multiple task-state-dependent obstacles.
arXiv Detail & Related papers (2024-08-11T14:57:57Z) - Vortex under Ripplet: An Empirical Study of RAG-enabled Applications [6.588605888228515]
Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios.
We manually studied 100 open-source applications that incorporate RAG-enhanced LLMs, and their issue reports.
We have found that more than 98% of applications contain multiple integration defects that harm software functionality, efficiency, and security.
arXiv Detail & Related papers (2024-07-06T17:25:11Z) - Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides [39.29715168284971]
Service teams compile troubleshooting knowledge into Guides (TSGs) accessible to on-call engineers (OCEs)
TSGs are often unstructured and incomplete, which requires manual interpretation by OCEs, leading to on-call fatigue and decreased productivity.
We propose Nissist which leverages TSGs and incident mitigation histories to provide proactive suggestions, reducing human intervention.
arXiv Detail & Related papers (2024-02-27T14:14:23Z) - Exploring Sparsity in Graph Transformers [67.48149404841925]
Graph Transformers (GTs) have achieved impressive results on various graph-related tasks.
However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments.
We propose a comprehensive textbfGraph textbfTransformer textbfSParsification (GTSP) framework that helps to reduce the computational complexity of GTs.
arXiv Detail & Related papers (2023-12-09T06:21:44Z) - TRANSOM: An Efficient Fault-Tolerant System for Training LLMs [7.831906758749453]
Large language models (LLMs) with hundreds of billions or trillions of parameters, represented by chatGPT, have achieved profound impact on various fields.
Training LLMs with super-large-scale parameters requires large high-performance GPU clusters and long training periods lasting for months.
To address these issues, we propose TRANSOM, a novel fault-tolerant LLM training system.
arXiv Detail & Related papers (2023-10-16T04:06:52Z) - UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding
with Text-to-Text Language Models [170.88745906220174]
We propose the SKG framework, which unifies 21 SKG tasks into a text-to-text format.
We show that UnifiedSKG achieves state-of-the-art performance on almost all of the 21 tasks.
We also use UnifiedSKG to conduct a series of experiments on structured knowledge encoding variants across SKG tasks.
arXiv Detail & Related papers (2022-01-16T04:36:18Z) - The Benefits of Implicit Regularization from SGD in Least Squares
Problems [116.85246178212616]
gradient descent (SGD) exhibits strong algorithmic regularization effects in practice.
We make comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression.
arXiv Detail & Related papers (2021-08-10T09:56:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.