Text-to-OverpassQL: A Natural Language Interface for Complex Geodata
Querying of OpenStreetMap
- URL: http://arxiv.org/abs/2308.16060v1
- Date: Wed, 30 Aug 2023 14:33:25 GMT
- Title: Text-to-OverpassQL: A Natural Language Interface for Complex Geodata
Querying of OpenStreetMap
- Authors: Michael Staniek and Raphael Schumann and Maike Z\"ufle and Stefan
Riezler
- Abstract summary: We present Text-to-OverpassQL, a task designed to facilitate a natural language interface for querying geodata from OpenStreetMap (OSM)
Generating Overpass queries from natural language input serves multiple use-cases.
- Score: 17.01783992725517
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Text-to-OverpassQL, a task designed to facilitate a natural
language interface for querying geodata from OpenStreetMap (OSM). The Overpass
Query Language (OverpassQL) allows users to formulate complex database queries
and is widely adopted in the OSM ecosystem. Generating Overpass queries from
natural language input serves multiple use-cases. It enables novice users to
utilize OverpassQL without prior knowledge, assists experienced users with
crafting advanced queries, and enables tool-augmented large language models to
access information stored in the OSM database. In order to assess the
performance of current sequence generation models on this task, we propose
OverpassNL, a dataset of 8,352 queries with corresponding natural language
inputs. We further introduce task specific evaluation metrics and ground the
evaluation of the Text-to-OverpassQL task by executing the queries against the
OSM database. We establish strong baselines by finetuning sequence-to-sequence
models and adapting large language models with in-context examples. The
detailed evaluation reveals strengths and weaknesses of the considered learning
strategies, laying the foundations for further research into the
Text-to-OverpassQL task.
Related papers
- Less is More: Making Smaller Language Models Competent Subgraph Retrievers for Multi-hop KGQA [51.3033125256716]
We model the subgraph retrieval task as a conditional generation task handled by small language models.
Our base generative subgraph retrieval model, consisting of only 220M parameters, competitive retrieval performance compared to state-of-the-art models.
Our largest 3B model, when plugged with an LLM reader, sets new SOTA end-to-end performance on both the WebQSP and CWQ benchmarks.
arXiv Detail & Related papers (2024-10-08T15:22:36Z) - Text2SQL is Not Enough: Unifying AI and Databases with TAG [47.45480855418987]
Table-Augmented Generation (TAG) is a paradigm for answering natural language questions over databases.
We develop benchmarks to study the TAG problem and find that standard methods answer no more than 20% of queries correctly.
arXiv Detail & Related papers (2024-08-27T00:50:14Z) - UQE: A Query Engine for Unstructured Databases [71.49289088592842]
We investigate the potential of Large Language Models to enable unstructured data analytics.
We propose a new Universal Query Engine (UQE) that directly interrogates and draws insights from unstructured data collections.
arXiv Detail & Related papers (2024-06-23T06:58:55Z) - NL2KQL: From Natural Language to Kusto Query [1.7931930942711818]
NL2KQL is an innovative framework that uses large language models (LLMs) to convert natural language queries (NLQs) to Kusto Query Language (KQL) queries.
To validate NL2KQL's performance, we utilize an array of online (based on query execution) and offline (based on query parsing) metrics.
arXiv Detail & Related papers (2024-04-03T01:09:41Z) - From Text to CQL: Bridging Natural Language and Corpus Search Engine [27.56738323943742]
Corpus Query Language (CQL) is a critical tool for linguistic research and detailed analysis within text corpora.
This paper presents the first text-to-CQL task that aims to automate the translation of natural language into CQL.
arXiv Detail & Related papers (2024-02-21T12:11:28Z) - DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain
Question Answering over Knowledge Base and Text [73.68051228972024]
Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when relying on their internal knowledge.
Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge.
arXiv Detail & Related papers (2023-10-31T04:37:57Z) - An In-Context Schema Understanding Method for Knowledge Base Question
Answering [70.87993081445127]
Large Language Models (LLMs) have shown strong capabilities in language understanding and can be used to solve this task.
Existing methods bypass this challenge by initially employing LLMs to generate drafts of logic forms without schema-specific details.
We propose a simple In-Context Understanding (ICSU) method that enables LLMs to directly understand schemas by leveraging in-context learning.
arXiv Detail & Related papers (2023-10-22T04:19:17Z) - Allies: Prompting Large Language Model with Beam Search [107.38790111856761]
In this work, we propose a novel method called ALLIES.
Given an input query, ALLIES leverages LLMs to iteratively generate new queries related to the original query.
By iteratively refining and expanding the scope of the original query, ALLIES captures and utilizes hidden knowledge that may not be directly through retrieval.
arXiv Detail & Related papers (2023-05-24T06:16:44Z) - Querying Large Language Models with SQL [16.383179496709737]
In many use-cases, information is stored in text but not available in structured data.
With the rise of pre-trained Large Language Models (LLMs), there is now an effective solution to store and use information extracted from massive corpora of text documents.
We present Galois, a prototype based on a traditional database architecture, but with new physical operators for querying the underlying LLM.
arXiv Detail & Related papers (2023-04-02T06:58:14Z) - Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition [0.5639451539396457]
A booming amount of information is continuously added to the Internet as structured and unstructured data, feeding knowledge bases such as DBpedia and Wikidata.
The aim of Question Answering systems is to allow lay users to access such data using natural language without needing to write formal queries.
We show that sequence-to-sequence models are a viable and promising option to transform long utterances into complex SPARQL queries.
arXiv Detail & Related papers (2020-10-21T11:12:01Z) - ColloQL: Robust Cross-Domain Text-to-SQL Over Search Queries [10.273545005890496]
We introduce data augmentation techniques and a sampling-based content-aware BERT model (ColloQL)
ColloQL achieves 84.9% (execution) and 90.7% (execution) accuracy on the Wikilogical dataset.
arXiv Detail & Related papers (2020-10-19T23:53:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.