NL2KQL: From Natural Language to Kusto Query
- URL: http://arxiv.org/abs/2404.02933v2
- Date: Mon, 15 Apr 2024 22:10:17 GMT
- Title: NL2KQL: From Natural Language to Kusto Query
- Authors: Amir H. Abdi, Xinye Tang, Jeremias Eichelbaum, Mahan Das, Alex Klein, Nihal Irmak Pakis, William Blum, Daniel L Mace, Tanvi Raja, Namrata Padmanabhan, Ye Xing,
- Abstract summary: NL2KQL is an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to Kusto Query Language (KQL) queries.
To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics.
- Score: 1.7931930942711818
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data is growing rapidly in volume and complexity. Proficiency in database query languages is pivotal for crafting effective queries. As coding assistants become more prevalent, there is significant opportunity to enhance database query languages. The Kusto Query Language (KQL) is a widely used query language for large semi-structured data such as logs, telemetries, and time-series for big data analytics platforms. This paper introduces NL2KQL an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to KQL queries. The proposed NL2KQL framework includes several key components: Schema Refiner which narrows down the schema to its most pertinent elements; the Few-shot Selector which dynamically selects relevant examples from a few-shot dataset; and the Query Refiner which repairs syntactic and semantic errors in KQL queries. Additionally, this study outlines a method for generating large datasets of synthetic NLQ-KQL pairs which are valid within a specific database contexts. To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics. Through ablation studies, the significance of each framework component is examined, and the datasets used for benchmarking are made publicly available. This work is the first of its kind and is compared with available baselines to demonstrate its effectiveness.
Related papers
- Effective Instruction Parsing Plugin for Complex Logical Query Answering on Knowledge Graphs [51.33342412699939]
Knowledge Graph Query Embedding (KGQE) aims to embed First-Order Logic (FOL) queries in a low-dimensional KG space for complex reasoning over incomplete KGs.
Recent studies integrate various external information (such as entity types and relation context) to better capture the logical semantics of FOL queries.
We propose an effective Query Instruction Parsing (QIPP) that captures latent query patterns from code-like query instructions.
arXiv Detail & Related papers (2024-10-27T03:18:52Z) - Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - Database-Augmented Query Representation for Information Retrieval [59.57065228857247]
We present a novel retrieval framework called Database-Augmented Query representation (DAQu)
DAQu augments the original query with various (query-related) metadata across multiple tables.
We validate DAQu in diverse retrieval scenarios that can incorporate metadata from the relational database.
arXiv Detail & Related papers (2024-06-23T05:02:21Z) - Text-to-OverpassQL: A Natural Language Interface for Complex Geodata
Querying of OpenStreetMap [17.01783992725517]
We present Text-to-OverpassQL, a task designed to facilitate a natural language interface for querying geodata from OpenStreetMap (OSM)
Generating Overpass queries from natural language input serves multiple use-cases.
arXiv Detail & Related papers (2023-08-30T14:33:25Z) - Improving Text-to-SQL Semantic Parsing with Fine-grained Query
Understanding [84.04706075621013]
We present a general-purpose, modular neural semantic parsing framework based on token-level fine-grained query understanding.
Our framework consists of three modules: named entity recognizer (NER), neural entity linker (NEL) and neural entity linker (NSP)
arXiv Detail & Related papers (2022-09-28T21:00:30Z) - Improving Candidate Retrieval with Entity Profile Generation for
Wikidata Entity Linking [76.00737707718795]
We propose a novel candidate retrieval paradigm based on entity profiling.
We use the profile to query the indexed search engine to retrieve candidate entities.
Our approach complements the traditional approach of using a Wikipedia anchor-text dictionary.
arXiv Detail & Related papers (2022-02-27T17:38:53Z) - SPARQLing Database Queries from Intermediate Question Decompositions [7.475027071883912]
To translate natural language questions into database queries, most approaches rely on a fully annotated training set.
We reduce this burden using grounded in databases intermediate question representations.
Our pipeline consists of two parts: a semantic that converts natural language questions into the intermediate representations and a non-trainable transpiler to the QLSPAR query language.
arXiv Detail & Related papers (2021-09-13T17:57:12Z) - Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition [0.5639451539396457]
A booming amount of information is continuously added to the Internet as structured and unstructured data, feeding knowledge bases such as DBpedia and Wikidata.
The aim of Question Answering systems is to allow lay users to access such data using natural language without needing to write formal queries.
We show that sequence-to-sequence models are a viable and promising option to transform long utterances into complex SPARQL queries.
arXiv Detail & Related papers (2020-10-21T11:12:01Z) - ColloQL: Robust Cross-Domain Text-to-SQL Over Search Queries [10.273545005890496]
We introduce data augmentation techniques and a sampling-based content-aware BERT model (ColloQL)
ColloQL achieves 84.9% (execution) and 90.7% (execution) accuracy on the Wikilogical dataset.
arXiv Detail & Related papers (2020-10-19T23:53:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.