AutoTSG: Learning and Synthesis for Incident Troubleshooting
- URL: http://arxiv.org/abs/2205.13457v1
- Date: Thu, 26 May 2022 16:05:11 GMT
- Title: AutoTSG: Learning and Synthesis for Incident Troubleshooting
- Authors: Manish Shetty, Chetan Bansal, Sai Pramod Upadhyayula, Arjun
Radhakrishna, Anurag Gupta
- Abstract summary: We conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of incidents.
We find that TSGs are widely used and help significantly reduce mitigation efforts.
We propose AutoTSG -- a novel framework for automation of TSGs executable by combining machine learning and program synthesis.
- Score: 6.297939852772734
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Incident management is a key aspect of operating large-scale cloud services.
To aid with faster and efficient resolution of incidents, engineering teams
document frequent troubleshooting steps in the form of Troubleshooting Guides
(TSGs), to be used by on-call engineers (OCEs). However, TSGs are siloed,
unstructured, and often incomplete, requiring developers to manually understand
and execute necessary steps. This results in a plethora of issues such as
on-call fatigue, reduced productivity, and human errors. In this work, we
conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of
incidents and find that TSGs are widely used and help significantly reduce
mitigation efforts. We then analyze feedback on TSGs provided by 400+ OCEs and
propose a taxonomy of issues that highlights significant gaps in TSG quality.
To alleviate these gaps, we investigate the automation of TSGs and propose
AutoTSG -- a novel framework for automation of TSGs to executable workflows by
combining machine learning and program synthesis. Our evaluation of AutoTSG on
50 TSGs shows the effectiveness in both identifying TSG statements (accuracy
0.89) and parsing them for execution (precision 0.94 and recall 0.91). Lastly,
we survey ten Microsoft engineers and show the importance of TSG automation and
the usefulness of AutoTSG.
Related papers
- Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users.
Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources.
We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z) - AI-in-the-Loop Sensing and Communication Joint Design for Edge Intelligence [65.29835430845893]
We propose a framework that enhances edge intelligence through AI-in-the-loop joint sensing and communication.
A key contribution of our work is establishing an explicit relationship between validation loss and the system's tunable parameters.
We show that our framework reduces communication energy consumption by up to 77 percent and sensing costs measured by the number of samples by up to 52 percent.
arXiv Detail & Related papers (2025-02-14T14:56:58Z) - Enhancing the Reasoning Capabilities of Small Language Models via Solution Guidance Fine-Tuning [14.857842644246634]
This paper introduces Solution Guidance (SG) and a plug-and-play training paradigm Solution-Guidance Fine-Tuning (SGFT)
SG focuses on problem understanding and decomposition at the semantic and logical levels, rather than specific computations.
SGFT can fine-tune a SLM to produce accurate problem-solving guidances, which can be flexibly fed to any SLM as prompts.
arXiv Detail & Related papers (2024-12-13T06:45:26Z) - AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs.
Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z) - Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities.
Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z) - GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making [5.254038213371586]
Large Language Models (LLMs) present a promising solution to these challenges.
GoNoGo is designed to streamline automotive software deployment while meeting both functional requirements and practical industrial constraints.
GoNoGo achieves a 100% success rate for tasks up to Level 2 difficulty with 3-shot examples, and maintains high performance even for more complex tasks.
arXiv Detail & Related papers (2024-08-19T08:22:20Z) - A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements [51.54559117314768]
Task And Motion Planning (TAMP) is the problem of finding a solution to an automated planning problem.
We propose a general and open-source framework for modeling and benchmarking TAMP problems.
We introduce an innovative meta-technique to solve TAMP problems involving moving agents and multiple task-state-dependent obstacles.
arXiv Detail & Related papers (2024-08-11T14:57:57Z) - Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides [39.29715168284971]
Service teams compile troubleshooting knowledge into Guides (TSGs) accessible to on-call engineers (OCEs)
TSGs are often unstructured and incomplete, which requires manual interpretation by OCEs, leading to on-call fatigue and decreased productivity.
We propose Nissist which leverages TSGs and incident mitigation histories to provide proactive suggestions, reducing human intervention.
arXiv Detail & Related papers (2024-02-27T14:14:23Z) - TRANSOM: An Efficient Fault-Tolerant System for Training LLMs [7.831906758749453]
Large language models (LLMs) with hundreds of billions or trillions of parameters, represented by chatGPT, have achieved profound impact on various fields.
Training LLMs with super-large-scale parameters requires large high-performance GPU clusters and long training periods lasting for months.
To address these issues, we propose TRANSOM, a novel fault-tolerant LLM training system.
arXiv Detail & Related papers (2023-10-16T04:06:52Z) - UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding
with Text-to-Text Language Models [170.88745906220174]
We propose the SKG framework, which unifies 21 SKG tasks into a text-to-text format.
We show that UnifiedSKG achieves state-of-the-art performance on almost all of the 21 tasks.
We also use UnifiedSKG to conduct a series of experiments on structured knowledge encoding variants across SKG tasks.
arXiv Detail & Related papers (2022-01-16T04:36:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.