Related papers: Xpert: Empowering Incident Management with Query Recommendations via Large Language Models

Xpert: Empowering Incident Management with Query Recommendations via Large Language Models

URL: http://arxiv.org/abs/2312.11988v1
Date: Tue, 19 Dec 2023 09:30:58 GMT
Title: Xpert: Empowering Incident Management with Query Recommendations via Large Language Models
Authors: Yuxuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, Yu Kang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
Abstract summary: This paper presents a study on the utilization of queries of KQL, a DSL employed for incident management in a large-scale cloud management system at Microsoft. We introduce Xpert, an end-to-end machine learning framework that automates KQL recommendation process.
Score: 39.73744433173498
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large-scale cloud systems play a pivotal role in modern IT infrastructure. However, incidents occurring within these systems can lead to service disruptions and adversely affect user experience. To swiftly resolve such incidents, on-call engineers depend on crafting domain-specific language (DSL) queries to analyze telemetry data. However, writing these queries can be challenging and time-consuming. This paper presents a thorough empirical study on the utilization of queries of KQL, a DSL employed for incident management in a large-scale cloud management system at Microsoft. The findings obtained underscore the importance and viability of KQL queries recommendation to enhance incident management. Building upon these valuable insights, we introduce Xpert, an end-to-end machine learning framework that automates KQL recommendation process. By leveraging historical incident data and large language models, Xpert generates customized KQL queries tailored to new incidents. Furthermore, Xpert incorporates a novel performance metric called Xcore, enabling a thorough evaluation of query quality from three comprehensive perspectives. We conduct extensive evaluations of Xpert, demonstrating its effectiveness in offline settings. Notably, we deploy Xpert in the real production environment of a large-scale incident management system in Microsoft, validating its efficiency in supporting incident management. To the best of our knowledge, this paper represents the first empirical study of its kind, and Xpert stands as a pioneering DSL query recommendation framework designed for incident management.

Related papers

The benefits of query-based KGQA systems for complex and temporal questions in LLM era [55.20230501807337]
Large language models excel in question-answering (QA) yet still struggle with multi-hop reasoning and temporal questions.<n> Query-based knowledge graph QA (KGQA) offers a modular alternative by generating executable queries instead of direct answers.<n>We explore multi-stage query-based framework for WikiData QA, proposing multi-stage approach that enhances performance on challenging multi-hop and temporal benchmarks.
arXiv Detail & Related papers (2025-07-16T06:41:03Z)
MMTU: A Massive Multi-Task Table Understanding and Reasoning Benchmark [70.47478110973042]
We introduce MMTU, a large-scale benchmark with over 30K questions across 25 real-world table tasks.<n> MMTU is designed to comprehensively evaluate models ability to understand, reason, and manipulate real tables at the expert-level.<n>We show that MMTU require a combination of skills -- including table understanding, reasoning, and coding -- that remain challenging for today's frontier models.
arXiv Detail & Related papers (2025-06-05T21:05:03Z)
GateLens: A Reasoning-Enhanced LLM Agent for Automotive Software Release Analytics [9.549568621873386]
GateLens is an Algebra-based tool for analyzing datasets in the automotive domain. It achieves higher F1 scores and handling complex and ambiguous queries with greater robustness. It reduces analysis time by over 80% while maintaining high accuracy and reliability.
arXiv Detail & Related papers (2025-03-27T17:48:32Z)
How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities [62.474732677086855]
Large language model (LLM) routing has emerged as a crucial strategy for balancing computational costs with performance. We propose the DSC benchmark: Diverse, Simple, and Categorized, an evaluation framework that categorizes router performance across a broad spectrum of query types.
arXiv Detail & Related papers (2025-03-20T19:52:30Z)
PromAssistant: Leveraging Large Language Models for Text-to-PromQL [22.44987357626691]
We focus on PromQL, which is the metric query DSL provided by the widely used metric monitoring system Prometheus.<n>We propose PromAssistant, a Large Language Model-based text-to-PromQL framework.<n>To the best of our knowledge, this paper is the first study of text-to-PromQL, and PromAssistant pioneered the DSL generation framework for metric querying and analysis.
arXiv Detail & Related papers (2025-03-05T02:22:01Z)
Improving Scientific Document Retrieval with Concept Coverage-based Query Set Generation [49.29180578078616]
Concept Coverage-based Query set Generation (CCQGen) framework designed to generate a set of queries with comprehensive coverage of the document's concepts. We identify concepts not sufficiently covered by previous queries, and leverage them as conditions for subsequent query generation. This approach guides each new query to complement the previous ones, aiding in a thorough understanding of the document.
arXiv Detail & Related papers (2025-02-16T15:59:50Z)
Context Awareness Gate For Retrieval Augmented Generation [2.749898166276854]
Retrieval Augmented Generation (RAG) has emerged as a widely adopted approach to mitigate the limitations of large language models (LLMs) in answering domain-specific questions. Previous research has predominantly focused on improving the accuracy and quality of retrieved data chunks to enhance the overall performance of the generation pipeline. We investigate the impact of retrieving irrelevant information in open-domain question answering, highlighting its significant detrimental effect on the quality of LLM outputs.
arXiv Detail & Related papers (2024-11-25T06:48:38Z)
XMainframe: A Large Language Model for Mainframe Modernization [5.217282407759193]
Mainframe operating systems continue to support critical sectors like finance and government. These systems are often viewed as outdated, requiring extensive maintenance and modernization. We introduce XMainframe, a state-of-the-art large language model (LLM) specifically designed with knowledge of legacy systems and mainframes.
arXiv Detail & Related papers (2024-08-05T20:01:10Z)
MMAU: A Holistic Benchmark of Agent Capabilities Across Diverse Domains [54.117238759317004]
Massive Multitask Agent Understanding (MMAU) benchmark features comprehensive offline tasks that eliminate the need for complex environment setups. It evaluates models across five domains, including Tool-use, Directed Acyclic Graph (DAG) QA, Data Science and Machine Learning coding, Contest-level programming and Mathematics. With a total of 20 meticulously designed tasks encompassing over 3K distinct prompts, MMAU provides a comprehensive framework for evaluating the strengths and limitations of LLM agents.
arXiv Detail & Related papers (2024-07-18T00:58:41Z)
Exploring LLM-based Agents for Root Cause Analysis [17.053079105858497]
Root cause analysis (RCA) is a critical part of the incident management process. Large Language Models (LLMs) have been used to perform RCA, but are not able to collect additional diagnostic information. We present an evaluation of a ReAct agent equipped with retrieval tools, on an out-of-distribution dataset of production incidents collected at Microsoft.
arXiv Detail & Related papers (2024-03-07T00:44:01Z)
X-lifecycle Learning for Cloud Incident Management using LLMs [18.076347758182067]
Incident management for large cloud services is a complex and tedious process. Recent advancements in large language models [LLMs] created opportunities to automatically generate contextual recommendations. In this paper, we demonstrate that augmenting additional contextual data from different stages of SDLC improves the performance.
arXiv Detail & Related papers (2024-02-15T06:19:02Z)
FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering [16.88132219032486]
We introduce FlexKBQA to mitigate the burden associated with manual annotation. We leverage Large Language Models (LLMs) as program translators for addressing the challenges inherent in the few-shot KBQA task. Specifically, FlexKBQA leverages automated algorithms to sample diverse programs, such as SPARQL queries, from the knowledge base. We observe that under the few-shot even the more challenging zero-shot scenarios, FlexKBQA achieves impressive results with a few annotations.
arXiv Detail & Related papers (2023-08-23T11:00:36Z)
Building Interpretable and Reliable Open Information Retriever for New Domains Overnight [67.03842581848299]
Information retrieval is a critical component for many down-stream tasks such as open-domain question answering (QA) We propose an information retrieval pipeline that uses entity/event linking model and query decomposition model to focus more accurately on different information units of the query. We show that, while being more interpretable and reliable, our proposed pipeline significantly improves passage coverages and denotation accuracies across five IR and QA benchmarks.
arXiv Detail & Related papers (2023-08-09T07:47:17Z)
Mining Root Cause Knowledge from Cloud Service Incident Investigations for AIOps [71.12026848664753]
Root Cause Analysis (RCA) of any service-disrupting incident is one of the most critical as well as complex tasks in IT processes. In this work, we present ICA and the downstream Incident Search and Retrieval based RCA pipeline, built at Salesforce.
arXiv Detail & Related papers (2022-04-21T02:33:34Z)
UKP-SQUARE: An Online Platform for Question Answering Research [50.35348764297317]
We present UKP-SQUARE, an online QA platform for researchers which allows users to query and analyze a large collection of modern Skills. UKP-SQUARE allows users to query and analyze a large collection of modern Skills via a user-friendly web interface and integrated tests.
arXiv Detail & Related papers (2022-03-25T15:00:24Z)
SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines. This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.