AutoTSG: Learning and Synthesis for Incident Troubleshooting
- URL: http://arxiv.org/abs/2205.13457v1
- Date: Thu, 26 May 2022 16:05:11 GMT
- Title: AutoTSG: Learning and Synthesis for Incident Troubleshooting
- Authors: Manish Shetty, Chetan Bansal, Sai Pramod Upadhyayula, Arjun
Radhakrishna, Anurag Gupta
- Abstract summary: We conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of incidents.
We find that TSGs are widely used and help significantly reduce mitigation efforts.
We propose AutoTSG -- a novel framework for automation of TSGs executable by combining machine learning and program synthesis.
- Score: 6.297939852772734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incident management is a key aspect of operating large-scale cloud services.
To aid with faster and efficient resolution of incidents, engineering teams
document frequent troubleshooting steps in the form of Troubleshooting Guides
(TSGs), to be used by on-call engineers (OCEs). However, TSGs are siloed,
unstructured, and often incomplete, requiring developers to manually understand
and execute necessary steps. This results in a plethora of issues such as
on-call fatigue, reduced productivity, and human errors. In this work, we
conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of
incidents and find that TSGs are widely used and help significantly reduce
mitigation efforts. We then analyze feedback on TSGs provided by 400+ OCEs and
propose a taxonomy of issues that highlights significant gaps in TSG quality.
To alleviate these gaps, we investigate the automation of TSGs and propose
AutoTSG -- a novel framework for automation of TSGs to executable workflows by
combining machine learning and program synthesis. Our evaluation of AutoTSG on
50 TSGs shows the effectiveness in both identifying TSG statements (accuracy
0.89) and parsing them for execution (precision 0.94 and recall 0.91). Lastly,
we survey ten Microsoft engineers and show the importance of TSG automation and
the usefulness of AutoTSG.
Related papers
- Vortex under Ripplet: An Empirical Study of RAG-enabled Applications [6.588605888228515]
Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios.
We manually studied 100 open-source applications that incorporate RAG-enhanced LLMs, and their issue reports.
We have found that more than 98% of applications contain multiple integration defects that harm software functionality, efficiency, and security.
arXiv Detail & Related papers (2024-07-06T17:25:11Z) - Automated Text Scoring in the Age of Generative AI for the GPU-poor [49.1574468325115]
We analyze the performance and efficiency of open-source, small-scale generative language models for automated text scoring.
Results show that GLMs can be fine-tuned to achieve adequate, though not state-of-the-art, performance.
arXiv Detail & Related papers (2024-07-02T01:17:01Z) - Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides [39.29715168284971]
Service teams compile troubleshooting knowledge into Guides (TSGs) accessible to on-call engineers (OCEs)
TSGs are often unstructured and incomplete, which requires manual interpretation by OCEs, leading to on-call fatigue and decreased productivity.
We propose Nissist which leverages TSGs and incident mitigation histories to provide proactive suggestions, reducing human intervention.
arXiv Detail & Related papers (2024-02-27T14:14:23Z) - The Benefits of a Concise Chain of Thought on Problem-Solving in Large
Language Models [0.0]
CCoT reduced average response length by 48.70% for both GPT-3.5 and GPT-4 while having a negligible impact on problem-solving performance.
Overall, CCoT leads to an average per-token cost reduction of 22.67%.
arXiv Detail & Related papers (2024-01-11T01:52:25Z) - AutoAct: Automatic Agent Learning from Scratch for QA via Self-Planning [54.47116888545878]
AutoAct is an automatic agent learning framework for QA.
It does not rely on large-scale annotated data and synthetic planning trajectories from closed-source models.
arXiv Detail & Related papers (2024-01-10T16:57:24Z) - Exploring Sparsity in Graph Transformers [67.48149404841925]
Graph Transformers (GTs) have achieved impressive results on various graph-related tasks.
However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments.
We propose a comprehensive textbfGraph textbfTransformer textbfSParsification (GTSP) framework that helps to reduce the computational complexity of GTs.
arXiv Detail & Related papers (2023-12-09T06:21:44Z) - TRANSOM: An Efficient Fault-Tolerant System for Training LLMs [7.831906758749453]
Large language models (LLMs) with hundreds of billions or trillions of parameters, represented by chatGPT, have achieved profound impact on various fields.
Training LLMs with super-large-scale parameters requires large high-performance GPU clusters and long training periods lasting for months.
To address these issues, we propose TRANSOM, a novel fault-tolerant LLM training system.
arXiv Detail & Related papers (2023-10-16T04:06:52Z) - AutoML-GPT: Automatic Machine Learning with GPT [74.30699827690596]
We propose developing task-oriented prompts and automatically utilizing large language models (LLMs) to automate the training pipeline.
We present the AutoML-GPT, which employs GPT as the bridge to diverse AI models and dynamically trains models with optimized hyper parameters.
This approach achieves remarkable results in computer vision, natural language processing, and other challenging areas.
arXiv Detail & Related papers (2023-05-04T02:09:43Z) - Optimization Algorithms in Smart Grids: A Systematic Literature Review [4.301367153728695]
This paper focuses on novel features and applications of smart grids in domestic and industrial sectors.
Specifically, we focused on Genetic algorithm, Particle Swarm Optimization, and Grey Wolf Optimization.
arXiv Detail & Related papers (2023-01-16T12:31:06Z) - UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding
with Text-to-Text Language Models [170.88745906220174]
We propose the SKG framework, which unifies 21 SKG tasks into a text-to-text format.
We show that UnifiedSKG achieves state-of-the-art performance on almost all of the 21 tasks.
We also use UnifiedSKG to conduct a series of experiments on structured knowledge encoding variants across SKG tasks.
arXiv Detail & Related papers (2022-01-16T04:36:18Z) - The Benefits of Implicit Regularization from SGD in Least Squares
Problems [116.85246178212616]
gradient descent (SGD) exhibits strong algorithmic regularization effects in practice.
We make comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression.
arXiv Detail & Related papers (2021-08-10T09:56:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.