Related papers: AutoTSG: Learning and Synthesis for Incident Troubleshooting

AutoTSG: Learning and Synthesis for Incident Troubleshooting

URL: http://arxiv.org/abs/2205.13457v1
Date: Thu, 26 May 2022 16:05:11 GMT
Title: AutoTSG: Learning and Synthesis for Incident Troubleshooting
Authors: Manish Shetty, Chetan Bansal, Sai Pramod Upadhyayula, Arjun Radhakrishna, Anurag Gupta
Abstract summary: We conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of incidents. We find that TSGs are widely used and help significantly reduce mitigation efforts. We propose AutoTSG -- a novel framework for automation of TSGs executable by combining machine learning and program synthesis.
Score: 6.297939852772734
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Incident management is a key aspect of operating large-scale cloud services. To aid with faster and efficient resolution of incidents, engineering teams document frequent troubleshooting steps in the form of Troubleshooting Guides (TSGs), to be used by on-call engineers (OCEs). However, TSGs are siloed, unstructured, and often incomplete, requiring developers to manually understand and execute necessary steps. This results in a plethora of issues such as on-call fatigue, reduced productivity, and human errors. In this work, we conduct a large-scale empirical study of over 4K+ TSGs mapped to 1000s of incidents and find that TSGs are widely used and help significantly reduce mitigation efforts. We then analyze feedback on TSGs provided by 400+ OCEs and propose a taxonomy of issues that highlights significant gaps in TSG quality. To alleviate these gaps, we investigate the automation of TSGs and propose AutoTSG -- a novel framework for automation of TSGs to executable workflows by combining machine learning and program synthesis. Our evaluation of AutoTSG on 50 TSGs shows the effectiveness in both identifying TSG statements (accuracy 0.89) and parsing them for execution (precision 0.94 and recall 0.91). Lastly, we survey ten Microsoft engineers and show the importance of TSG automation and the usefulness of AutoTSG.

Related papers

The Automation Advantage in AI Red Teaming [0.0]
This paper analyzes Large Language Model (LLM) security vulnerabilities based on data from Crucible. Our findings reveal automated approaches significantly outperform manual techniques, despite only 5.2% of users employing automation.
arXiv Detail & Related papers (2025-04-28T14:48:00Z)
Affordable AI Assistants with Knowledge Graph of Thoughts [15.045446816762675]
Large Language Models (LLMs) are revolutionizing the development of AI assistants capable of performing diverse tasks across domains. We propose Knowledge Graph of Thoughts (KGoT), an innovative AI assistant architecture that integrates LLM reasoning with dynamically constructed knowledge graphs (KGs) KGoT achieves a 29% improvement in task success rates on the GAIA benchmark compared to Hugging Face Agents with GPT-4o mini, while reducing costs by over 36x compared to GPT-4o.
arXiv Detail & Related papers (2025-04-03T15:11:55Z)
Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [61.00662702026523]
We propose a unified Test-Time Compute scaling framework that leverages increased inference-time instead of larger models. Our framework incorporates two complementary strategies: internal TTC and external TTC. We demonstrate our textbf32B model achieves a 46% issue resolution rate, surpassing significantly larger models such as DeepSeek R1 671B and OpenAI o1.
arXiv Detail & Related papers (2025-03-31T07:31:32Z)
Intelligent Mobile AI-Generated Content Services via Interactive Prompt Engineering and Dynamic Service Provisioning [55.641299901038316]
AI-generated content can organize collaborative Mobile AIGC Service Providers (MASPs) at network edges to provide ubiquitous and customized content for resource-constrained users. Such a paradigm faces two significant challenges: 1) raw prompts often lead to poor generation quality due to users' lack of experience with specific AIGC models, and 2) static service provisioning fails to efficiently utilize computational and communication resources. We develop an interactive prompt engineering mechanism that leverages a Large Language Model (LLM) to generate customized prompt corpora and employs Inverse Reinforcement Learning (IRL) for policy imitation.
arXiv Detail & Related papers (2025-02-17T03:05:20Z)
Enhancing the Reasoning Capabilities of Small Language Models via Solution Guidance Fine-Tuning [14.857842644246634]
This paper introduces Solution Guidance (SG) and a plug-and-play training paradigm Solution-Guidance Fine-Tuning (SGFT) SG focuses on problem understanding and decomposition at the semantic and logical levels, rather than specific computations. SGFT can fine-tune a SLM to produce accurate problem-solving guidances, which can be flexibly fed to any SLM as prompts.
arXiv Detail & Related papers (2024-12-13T06:45:26Z)
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing? [54.65079443902714]
We introduce AutoPT, an automated penetration testing agent based on the principle of PSM driven by LLMs. Our results show that AutoPT outperforms the baseline framework ReAct on the GPT-4o mini model.
arXiv Detail & Related papers (2024-11-02T13:24:30Z)
Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [62.94719119451089]
Lingma SWE-GPT series learns from and simulating real-world code submission activities. Lingma SWE-GPT 72B resolves 30.20% of GitHub issues, marking a significant improvement in automatic issue resolution.
arXiv Detail & Related papers (2024-11-01T14:27:16Z)
GoNoGo: An Efficient LLM-based Multi-Agent System for Streamlining Automotive Software Release Decision-Making [5.254038213371586]
Large Language Models (LLMs) present a promising solution to these challenges. GoNoGo is designed to streamline automotive software deployment while meeting both functional requirements and practical industrial constraints. GoNoGo achieves a 100% success rate for tasks up to Level 2 difficulty with 3-shot examples, and maintains high performance even for more complex tasks.
arXiv Detail & Related papers (2024-08-19T08:22:20Z)
A Meta-Engine Framework for Interleaved Task and Motion Planning using Topological Refinements [51.54559117314768]
Task And Motion Planning (TAMP) is the problem of finding a solution to an automated planning problem. We propose a general and open-source framework for modeling and benchmarking TAMP problems. We introduce an innovative meta-technique to solve TAMP problems involving moving agents and multiple task-state-dependent obstacles.
arXiv Detail & Related papers (2024-08-11T14:57:57Z)
Vortex under Ripplet: An Empirical Study of RAG-enabled Applications [6.588605888228515]
Large language models (LLMs) enhanced by retrieval-augmented generation (RAG) provide effective solutions in various application scenarios. We manually studied 100 open-source applications that incorporate RAG-enhanced LLMs, and their issue reports. We have found that more than 98% of applications contain multiple integration defects that harm software functionality, efficiency, and security.
arXiv Detail & Related papers (2024-07-06T17:25:11Z)
Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides [39.29715168284971]
Service teams compile troubleshooting knowledge into Guides (TSGs) accessible to on-call engineers (OCEs) TSGs are often unstructured and incomplete, which requires manual interpretation by OCEs, leading to on-call fatigue and decreased productivity. We propose Nissist which leverages TSGs and incident mitigation histories to provide proactive suggestions, reducing human intervention.
arXiv Detail & Related papers (2024-02-27T14:14:23Z)
Exploring Sparsity in Graph Transformers [67.48149404841925]
Graph Transformers (GTs) have achieved impressive results on various graph-related tasks. However, the huge computational cost of GTs hinders their deployment and application, especially in resource-constrained environments. We propose a comprehensive textbfGraph textbfTransformer textbfSParsification (GTSP) framework that helps to reduce the computational complexity of GTs.
arXiv Detail & Related papers (2023-12-09T06:21:44Z)
TRANSOM: An Efficient Fault-Tolerant System for Training LLMs [7.831906758749453]
Large language models (LLMs) with hundreds of billions or trillions of parameters, represented by chatGPT, have achieved profound impact on various fields. Training LLMs with super-large-scale parameters requires large high-performance GPU clusters and long training periods lasting for months. To address these issues, we propose TRANSOM, a novel fault-tolerant LLM training system.
arXiv Detail & Related papers (2023-10-16T04:06:52Z)
UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models [170.88745906220174]
We propose the SKG framework, which unifies 21 SKG tasks into a text-to-text format. We show that UnifiedSKG achieves state-of-the-art performance on almost all of the 21 tasks. We also use UnifiedSKG to conduct a series of experiments on structured knowledge encoding variants across SKG tasks.
arXiv Detail & Related papers (2022-01-16T04:36:18Z)
The Benefits of Implicit Regularization from SGD in Least Squares Problems [116.85246178212616]
gradient descent (SGD) exhibits strong algorithmic regularization effects in practice. We make comparisons of the implicit regularization afforded by (unregularized) average SGD with the explicit regularization of ridge regression.
arXiv Detail & Related papers (2021-08-10T09:56:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.