Related papers: KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers

KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers

URL: http://arxiv.org/abs/2106.11455v1
Date: Tue, 22 Jun 2021 00:08:03 GMT
Title: KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers
Authors: Chia-Hsuan Lee, Oleksandr Polozov, Matthew Richardson
Abstract summary: We present KDBaggleQA, a new cross-domain evaluation dataset of real Web databases. We show that KDBaggleQA presents a challenge to state-of-the-art zero-shots but that a more realistic evaluation setting and creative use of associated database documentation boosts their accuracy by over 13.2%.
Score: 26.15889661083109
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of database question answering is to enable natural language querying of real-life relational databases in diverse application domains. Recently, large-scale datasets such as Spider and WikiSQL facilitated novel modeling techniques for text-to-SQL parsing, improving zero-shot generalization to unseen databases. In this work, we examine the challenges that still prevent these techniques from practical deployment. First, we present KaggleDBQA, a new cross-domain evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions. Second, we re-examine the choice of evaluation tasks for text-to-SQL parsers as applied in real-life settings. Finally, we augment our in-domain evaluation task with database documentation, a naturally occurring source of implicit domain knowledge. We show that KaggleDBQA presents a challenge to state-of-the-art zero-shot parsers but a more realistic evaluation setting and creative use of associated database documentation boosts their accuracy by over 13.2%, doubling their performance.

Related papers

Bridging the Gap: Enabling Natural Language Queries for NoSQL Databases through Text-to-NoSQL Translation [25.638927795540454]
We introduce the Text-to-No task, which aims to convert natural language queries into accessible queries. To promote research in this area, we released a large-scale and open-source dataset for this task, named TEND (short interfaces for Text-to-No dataset) We also designed a SLM (Small Language Model)-assisted and RAG (Retrieval-augmented Generation)-assisted multi-step framework called SMART, which is specifically designed for Text-to-No conversion.
arXiv Detail & Related papers (2025-02-16T17:01:48Z)
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows [64.94146689665628]
Spider 2.0 is an evaluation framework for real-world text-to-sql problems derived from enterprise-level database use cases. The databases in Spider 2.0 are sourced from real data applications, often containing over 1,000 columns and stored in local or cloud database systems such as BigQuery and Snowflake. We show that solving problems in Spider 2.0 frequently requires understanding and searching through database metadata, dialect documentation, and even project-levels.
arXiv Detail & Related papers (2024-11-12T12:52:17Z)
CodeS: Towards Building Open-source Language Models for Text-to-SQL [42.11113113574589]
We introduce CodeS, a series of pre-trained language models with parameters ranging from 1B to 15B. CodeS is a fully open language model, which achieves superior accuracy with much smaller parameter sizes. We conduct comprehensive evaluations on multiple datasets, including the widely used Spider benchmark.
arXiv Detail & Related papers (2024-02-26T07:00:58Z)
Evaluating the Data Model Robustness of Text-to-SQL Systems Based on Real User Queries [4.141402725050671]
This paper is the first in-depth evaluation of the data model robustness of Text-to-- systems in practice. It is based on a real-world deployment of FootballDB, a system that was deployed over a 9 month period in the context of the FIFA World Cup 2022. All of our data is based on real user questions that were asked live to the system. We manually labeled and translated a subset of these questions for three different data models.
arXiv Detail & Related papers (2024-02-13T10:28:57Z)
SQL-PaLM: Improved Large Language Model Adaptation for Text-to-SQL (extended) [53.95151604061761]
This paper introduces the framework for enhancing Text-to- filtering using large language models (LLMs) With few-shot prompting, we explore the effectiveness of consistency decoding with execution-based error analyses. With instruction fine-tuning, we delve deep in understanding the critical paradigms that influence the performance of tuned LLMs.
arXiv Detail & Related papers (2023-05-26T21:39:05Z)
UNITE: A Unified Benchmark for Text-to-SQL Evaluation [72.72040379293718]
We introduce a UNIfied benchmark for Text-to-domain systems. It is composed of publicly available text-to-domain datasets and 29K databases. Compared to the widely used Spider benchmark, we introduce a threefold increase in SQL patterns.
arXiv Detail & Related papers (2023-05-25T17:19:52Z)
Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs [89.68522473384522]
We present Bird, a big benchmark for large-scale database grounded in text-to-efficient tasks. Our emphasis on database values highlights the new challenges of dirty database contents. Even the most effective text-to-efficient models, i.e. ChatGPT, achieves only 40.08% in execution accuracy.
arXiv Detail & Related papers (2023-05-04T19:02:29Z)
A Survey on Text-to-SQL Parsing: Concepts, Methods, and Future Directions [102.8606542189429]
The goal of text-to-corpora parsing is to convert a natural language (NL) question to its corresponding structured query language () based on the evidences provided by databases. Deep neural networks have significantly advanced this task by neural generation models, which automatically learn a mapping function from an input NL question to an output query.
arXiv Detail & Related papers (2022-08-29T14:24:13Z)
Text-to-SQL in the Wild: A Naturally-Occurring Dataset Based on Stack Exchange Data [3.06261471569622]
SEDE is a dataset with 12,023 pairs of utterances andsql queries collected from real usage on the Stack Exchange website. We show that these pairs contain a variety of real-world challenges which were rarely reflected so far in any other semantic parsing dataset.
arXiv Detail & Related papers (2021-06-09T12:09:51Z)
"What Do You Mean by That?" A Parser-Independent Interactive Approach for Enhancing Text-to-SQL [49.85635994436742]
We include human in the loop and present a novel-independent interactive approach (PIIA) that interacts with users using multi-choice questions. PIIA is capable of enhancing the text-to-domain performance with limited interaction turns by using both simulation and human evaluation.
arXiv Detail & Related papers (2020-11-09T02:14:33Z)

This list is automatically generated from the titles and abstracts of the papers in this site.